Prediction of Cross-Platform and Native Apps Technology Opportunities for Beginner Developers Using C 4.5 and Naïve Bayes Algorithms

— The competition between native and cross-platform app development makes application development simpler, safer, and more scalable. However, developers must have sufficient fundamentals, and the industry must conduct good research to shorten development time and minimize expenses. In order to solve these problems, this study made a prediction that discusses the technology that has a chance to survive in the industry so as not to be left behind in technology. Using Naïve Bayes and C 4.5 algorithms into a dataset with nine programming languages related to mobile app development. Results obtained in This research show Dart as a programming language that supports cross-platform frameworks and Kotlin as a programming language that supports native app frameworks is a technology that would have the opportunity in the future with an accuracy level above 90% with Naïve Bayes and C 4.5 algorithms. These results are obtained by testing an algorithm model using MAPE, consistent dataset sharing, and careful data processing. This research Can help entry-level developers learn and deepen the fundamentals of technology and can add knowledge to the industry in choosing a technology.


I. INTRODUCTION
Many entry-level developers do not seem to understand learning technologies in mobile app development.They learn from current global trends and do not consider the future opportunities of technology based on an industry perspective.The preference for technology is always correlated with the time and cost spent on development.Typically, mobile apps are developed with native programming languages specific to platforms such as Java/Kotlin in the case of Android, Swift/Objective-C in the case of iOS, and C# in the case of Windows Phone.Maintaining multiple codebases for a particular platform can be time-consuming and costly.Programming languages play an important role in the development of an application, both in terms of processing time, the performance of an application, the complexity of the application to be developed, and the cost of developing the application.
Furthermore, the JavaScript programming language is predicted to last for the next five years compared to a crossplatform framework supporting the programming language.Some studies revealed the excellence and popularity of JavaScript programming language compared to crossplatform frameworks, making development easy in terms of time and cost; it is impossible to abandon native Apps completely [1]- [3].Related to research on students' experience in using Native Apps and non-Native Apps, where they prefer Native Apps over non-Native Apps in terms of reliability and convenience, cross-platform frameworks consume more resources, one of which is the battery by taking 6% to 8% more, despite the underlying device, compared to native apps t regardless of the device used, compared to the Native Apps themselves [4]- [6] Some things that should be considered to ensure the selection of technology in developing an application, including the expertise of the team formed, reliability and support from vendors, UI customization, framework experience and capabilities, consistency between various platforms, security, and simplicity get the information needed in application development on the framework used [7] and a cross-platform development approach where there is no concrete software implementation in mind, but rather a common way to solve the challenge of developing a single application but running the application on multiple platforms [8].
Previous research compares the Naïve Bayes and C 4.5 algorithms with four cases and various attributes.Compare accuracy, precision, and recall values individually, then calculate the algorithm's advantages applied in each test.The results showed that, of the 4 cases, 1 case showed the same results with an accuracy value above 90%, 2 cases showed results where both algorithms were superior in each case where in the first case, the C 4.5 algorithm was superior with an average accuracy value of 95%.In the second case, the Naive Bayes algorithm was superior, with an average accuracy value of 66%.The third case of Naïve Bayes algorithm excels with an average accuracy score of 81%.The last case of the Naïve Bayes algorithm is only superior to precision with an average value of 62.72%, followed by accuracy and recall on the C 4.5 algorithm, which produces positive results with an average value of 70.84% for accuracy and 70.57% for recall.The method used to generate the data shown above is to compare accuracy, recall, and precision to evaluate or test the two algorithms.
Furthermore, for dataset division, the dataset starts from 50 data to the actual amount of data.For training and testing, data starts from the actual amount of data to 50 or close to 50.After that, a summary is made to evaluate which algorithm is superior to the two algorithms tested in the study by comparing one by one in each test directly the results without calculating the average value.Furthermore, research on the same topic and with the same number of attributes showed results that did not differ significantly between the two algorithms.In another study on the same topic with a difference of 2 classes, consisting of 2 attributes and three attributes.It turned out that the Naïve Bayes algorithm used another method with 12 attributes in 60% of training data and 40% of testing data.The results showed that the C 4.5 algorithm was better than Naïve Bayes in terms of accuracy, precision, recall, log loss, and specificity [9]- [12].In another study, the Java cross-platform graphical interface framework presented in this paper provides a new, scalable, and flexible solution that targets different Java development application scenarios.It can be applied to develop cross-platform graphical applications for Java Standard and Micro Editions where the same code can also be executed and integrated into Android mobile applications without significant changes or readaptations of the codebase [13].
In previous studies, using Optical Character Recognition (OCR) and the Naïve Bayes algorithm, a system was created that automatically classifies whether an image contains promotional offers or not without human intervention with an accuracy of 94.31% and an average precision of 94.33% compared to using the Random Forest and KNN algorithms [14].Previous research examined IoT's application by predicting COVID-19 risk levels and obtained accuracy results of 99% and 3% better than Random Forest.
Furthermore, Clustering is done using the K-means algorithm as SAFE and INSECURE [15].In a study that discusses the classification of the Naïve Bayes algorithm in predicting the smooth level of MSME core rental payments with 13 attributes (which have been classified into six attributes), the results are 81.81%accuracy, 66.66% precision, and 100% recall with an AUC value of 0.800 and processed through the Rapid Miner application [16].In previous studies examining effectiveness between Java and Kotlin, the results found that Kotlin was superior in terms of effectiveness and consistency [17].Another study discussing the application of QR-Code in the E-Market standalone application made based on Android stated that the application was able to facilitate the process of buying and selling transactions by 90% and sales product promotion by 90% thus increasing people's purchasing power and proving that in the future there would be many applications that are able to facilitate people in their daily lives [18].In another study discussing Mobile Applications for Monitoring Drug Addition to Infusion Fluids, it is known that the application is designed very simply with the Java programming language that supports native app frameworks [19].Therefore, it helps support the assumption that native app frameworks have a higher chance than cross-platform frameworks, so Kotlin would survive in the industry when it comes to Android app development.
This study implemented the Gaussian method using the Naive Bayes algorithm and C 4.5 for decision trees on programming language data to provide entry-level developers with opportunities to develop mobile-based applications.The C 4.5 algorithm is the algorithm that forms the decision tree because it is a well-known method of classification and prediction, and the Naïve Bayes algorithm is famous for predicting the likelihood that it would occur in the future even with minimal data in the past.In previous research conducted by Kurniawan [9], the method used in this study involves a method of comparing each test value, and the result obtained is how many algorithms are superior after being compared, then dividing the dataset using integers and consistent, but for testing, it is inconsistent because the data in each case study is different.
In this study, the method applied is a dataset division method that varies from 90% training data and 10% testing data to 90% training data and 10% testing data with 9 attributes in terms of accuracy, recall, precision, average calculation and Mean Absolute Percentage Error (MAPE) applied to evaluate the feasibility level of the algorithm, after that the results are compared to predict opportunities in terms of programming languages that support Cross-Platform frameworks and Native Apps, such as Dart, JavaScript, Kotlin, C, C++, C#, Java, Swift, Objective-C, and others.Next step, this study also aims to determine the opportunities of several programming languages that have the potential to be most widely adopted in the mobile application industry so that entry-level developers can learn and adjust and have the skills to be able to survive and develop in the industrial world.While in the realm of industry, can be a reference in determining what technology would be used in developing mobile-based applications.For a more detailed comparison between the previous study and mine can be seen in Table 1.

II. MATERIAL AND METHOD
A. Data Mining Data mining, officially known as knowledge discovery in Databases (KDD), describes a relatively new technique of using iterative and interactive processes to gain new knowledge/patterns/models using large amounts of existing data.KDD involves investigating models in big data sets using techniques at the intersection of machine learning, statistics, and database systems.It facilitates pattern checking, such as data categorization through cluster studies, abnormal record recognition, also known as anomaly detection, and related rules or dependencies [20], [21].In a previous data mining study, this research yielded new insights into the graduation data of students at Garuda High School for further processing [22].

B. Classification
Classification refers to finding models to visualize and differentiate data classes to predict the class of an object from unknown class labels.The method consists of two processes, including the training stage, which analyzes the training data and then generates classification rules.The second step is classification, where test data is used to test the accuracy of classification rules.Previous studies used classifications that provide classes, which would later be used to predict student enrollment activities [23].

C. Prediction
Prediction is the first stage of the decision-making process.It is important to know the actual problems in decisionmaking to predict.Prediction means the idea of quantity.For example, market demand for one or more products in the future.This is an important part of medical decision-making in medical research involving prediction.Choosing the preferred treatment plan may be the same as choosing the treatment plan with the best predictions [24].
Past data is systematically integrated through specific methods and processed to produce future conditions.The purpose of prediction is to provide insight to decision-makers and policymakers regarding potential uncertainties and risks that may arise and can be considered in planning.By making such predictions, planners and decision-makers can consider alternatives.The predicted results may not be precise due to the uncertainty of future circumstances or events.However, assuming all factors are determined correctly, the predicted result would be close to the expected result.In previous studies using predictions, the results obtained can be considered in the future [22].

D. C 4.5 Algorithms
The C 4.5 algorithm is arguably one of the most efficient decision tree algorithms for classification.This algorithm is used to generate a decision tree.The decision tree describes the prediction of the target variable with the basic idea of relying on selection attributes with the highest priority or having the highest gain value based on entropy.Related research using the C 4.5 algorithm method discussed developing a predictive model to prevent diabetes as early as possible with 49 data records and five attributes, including age, blood pressure, pulse, weight, and blood sugar levels.Then, the results obtained are 90% accuracy [25].The stages of this algorithm are shown below [23].

1)
Determine the Entropy value: Notes: S : The set of cases.n : The number of partitions S. pi : The proportion of Si towards S.

2)
Determine the Information Gain value of each attribute:

3)
Estimate the Split Info value of each attribute: Notes: S : The set of cases.A : Attribute.Si : Number of samples for attribute i.

4)
Determine the Gain Ratio value of each attribute: Notes: S : The set of cases.A : Attribute.Gain (S, A) : Information of gain on attribute A. Split Info (S, A) : Information of gain on attribute A.
The attribute with the highest Profit Ratio is considered root.After that, it is necessary to calculate the Profit Ratio value of each attribute that does not support the root.Otherwise, the attribute is considered a branch when the Profit Ratio is lower than the root Gain Ratio.

E. Naïve Bayes Algorithms
The Naïve Bayes algorithm refers to the Bayes theorem or a simplification technique based on the Bayesian algorithm originally proposed by Thomas Bayes, one of the British scientists who intended to build a classifier based on a conditional probability model.The Naïve Bayes Classifier model is a classification method that indicates the model has a decent accuracy, precision, recall, and f1 value.The major feature of this method is the independence assumption of each situation and considers selecting the best class marker according to this probability and the loss of valuation error [26]- [29].A related study involving the Naive Bayes algorithm method discusses whether a person is eligible to get a money loan from credit or not given with 10 data records and nine attributes, including name, gender, age, type of occupation, loan amount, repayment period, guarantee, income, and category.Then, the results showed that this system scored a precision value of 82%, an accuracy value of 80%, and a recall value of 94%, which implicitly said that this system could be considered successful in predicting a person's eligibility to get a loan from credit [30].
P H| X = -.|/ -/ -. ( Notes: X : Data with an unknown class H : Data hypothesis X is a specific class P H| X : Probability of hypothesis H based on condition x P H : Probability of hypothesis H (before evidence is observed) P X| H : Probability of X based on the condition.P X : The probability of X

F. Programming Languages
A programming language refers to standard instructions on instructing a computer program that serves a specific function.It is defined as a set of syntax and semantic rules used to define computer programs.This language allows a programmer to specify what data would be processed by the computer, the way this data would be stored/forwarded, and the actions to be initiated in various situations [31].

G. Mean
Arithmetic means a value that is commonly referred to as the average value.This value is gained by summing up the whole value and dividing it by the number of values.The formula [32], [33] estimates the mean value.0 = Notes: 0 = average x = the data to be found, average n = amount of data

H. Mean Absolute Percentage Error (MAPE)
Mean Absolute Percentage Error (MAPE) is estimated by dividing the absolute error over each period by the actual observed value within the period.Therefore, this approach comes in handy as the size or magnitude of the prediction variable is critical in evaluating the accuracy of the prediction.MAPE represents the amount of error in the prediction compared to the actual value of the series.MAPE also compares the accuracy of the same or opposite methods on two different series and measures the accuracy of the model's estimated value denoted by MAPE [34].In Figure 1, the calculation process of the C 4.5 algorithm uses formulas number 1 until number 4. Later, the Naïve Bayes algorithm calculation process uses formula number 5, while the calculation of Mean Absolute Percentage Error (MAPE) uses formula number 7.

I. Dataset
Referring to Figure 1, the collected data consists of programming language data, which is 3375 data from 2011 to 2021 extracted from the website https://www.kaggle.com.The data represents secondary data as it was processed by an open data provider (Kaggle) before publication.Table II shows the initial dataset.This dataset is utilized to analyze the possibilities in the future related to the changes in programming languages for developing mobile-based applications.

J. Pre-Processing dan Labeling
Based on Figure 1, the pre-processing used is data cleaning and data transformation to remove irrelevant columns/data and convert the data from character to numeric to facilitate the computational process.Based on Figure 1, the labeling process is applied to determine whether the programming language is possible by calculating the average in each programming language's "count" column.Then, it is converted from characters to numbers to facilitate the computation process.The results of 10% testing data are based on table V.The C 4.5 algorithm excels with an average accuracy rate of 100% for nine programming languages and for the Naïve Bayes algorithm, gaining 75% accuracy for the JavaScript programming language and 50% for the C programming language, as for the 90% testing data based on Table VI, the C 4.5 algorithm excels with an average accuracy of 71%, with the Kotlin, Java, and Swift programming languages being the top 3 with over 80% accuracy and for the Naïve Bayes algorithm getting the average accuracy rate of 56% with the dart programming language as being the highest accuracy rate.

Programming language
Naïve Bayes Algorithm C For the 30% testing data results based on table IX.The C 4.5 algorithm is 3% superior to the Naïve Bayes algorithm with an average accuracy rate of 96% and 93% with the Dart, C, C#, Java, Swift, Objective-C programming languages and 100% accuracy rate for the C 4.5 algorithm and the JavaScript, Kotlin, C++, Java, Swift programming languages.While with 70% of testing data based on table X, the C 4.5 algorithm outperformed 9% with an average accuracy rate of 94% and 85% with JavaScript, C ++, C#, Java, Swift programming language with a 100% accuracy rate for the C 4.5 algorithm and Dart, JavaScript, and Kotlin programming languages with a 100% accuracy rate for the Naïve Bayes algorithm.In the results of 40% test data based on table XI.The C 4.5 algorithm performed 1% ahead of the Naïve Bayes algorithm with average accuracy of 97% and 96%, indicating both algorithms almost balanced with Kotlin, C, C++, Java, Swift, and Objective-C programming languages being the top 5 with 100% accuracy for the C 4.5 algorithm and Kotlin, C#, Java, Swift, and Objective-c programming languages with 100% accuracy for the Naïve Bayes algorithm.Furthermore, with 60% testing data based on Table XII, the C 4.5 algorithm outperforms the Naïve Bayes algorithm by 7% with an average accuracy of 91% and 84% with the Dart, Kotlin, Java, and Swift programming languages with 100% accuracy for the C 4.5 algorithm and the Dart, C#, and Swift programming languages being the top 3 with an accuracy rate above 90% for the Naïve Bayes algorithm.In the 50% testing data results based on table XIII.The C 4.5 algorithm outperforms the Naïve Bayes algorithm by 1% with an average accuracy rate of 97% and 96%, which means both algorithms are almost balanced, with the Dart and C programming languages being the bottom 2 with an accuracy rate of 94% and 83% for the C 4.5 algorithm and the JavaScript, Kotlin, and Swift programming languages being the top 3 with an accuracy rate of 100% for the Naïve Bayes algorithm.Figure 2 states that the results of the Naïve Bayes algorithm, if the testing data is more than 90%, 6 out of 9 programming languages show an accuracy value below 80%, even up to 30%.Meanwhile, in Picture 3, it is revealed that the results of the C 4.5 algorithm, if the testing data is more than 70%, 8 out of 9 programming languages show an accuracy value below 80%, even up to 30%.The comparison of precision, recall, and accuracy values for Naïve Bayes and C 4.5 models can be seen in   Table XIV shows that in 20% of the testing data, the Naïve Bayes model excels in recall and accuracy values.In 60% of data testing, the Naïve Bayes model only excels in the recall value.In 80% of testing data, the Naïve Bayes model only excels in precision value.The C 4.5 model excels in the precision, recall, and overall accuracy values on the other testing data.Whereas previous research used 12 attributes with 40% testing data distribution, the C 4.5 algorithm excels in precision, recall, and accuracy.The results of this study, with nine attributes and 40% testing data distribution, show that the C 4.5 algorithm is still ahead regardless of the difference in attributes.
Then, according to Figure 2 and Figure 3, the calculation of the average for each programming language and test data is carried out.Next, the Mean Absolute Percentage Error is calculated to see how well the prediction on each model is used.Then, sort by highest to lowest accuracy for each programming language followed by the Mean Absolute Percentage Error in each programming language.Table XV presents a reference to see the feasibility of the algorithm model [35], followed by Table XVI presents a reference to see the feasibility of the algorithm accuracy [36], and Table XVII presents the results of the above calculations.After that, evidence was collected by comparing the actual data in 2015 with the data predicted to increase in 2022.Table XVIII displays a comparison of the data for the two years.A previous study conducted in 2018 predicted that the JavaScript programming language would endure for the next five years, proven by the number of frameworks that support the programming language.Nevertheless, the results of this study, by comparing data in 2015 as actual data and 2022 as predicted data in Table XVIII, indicate the Dart programming language at 65% and Kotlin 6% are proven to improve in 2022 and possess future opportunities for mobile application development and data visualization as shown in Figure 4.

IV. CONCLUSION
Many entry-level developers who are unsure where to learn the technology as the basis for developing mobile applications are concerned that focusing on one technology may not necessarily be the technology that would survive for the next 5 years.To deal with this, an application model is developed to predict what programming languages will likely endure.This model was developed using 2 different algorithms, namely the Naïve Bayes and C 4.5 algorithms, with various testing data ranging from 10% to 90%.Afterward, the results of the process were carried out, and the performance of the C 4.5 algorithm exceeded the overall average in precision, recall, and accuracy values.Referring to the data in Table VII, the Dart and Kotlin programming languages are technologies that would have opportunities in the future with an accuracy rate above 90% with both the Naïve Bayes and C 4.5 algorithms and the qualified technology used by these programming languages is the native apps framework for Kotlin which is expected to increase by 6% and the crossplatform framework for Dart which is expected to increase by 65%.Therefore, it is expected that entry-level developers may begin to learn the basics to fundamentals in depth to develop a mobile application using cross-platform technology.
S : The set of cases.A : Attribute.n : The number of attribute partitions A. |Si| : Number of cases in partition i. |S |: Number of cases within S.

Fig. 1
Fig. 1 Stages of the research

Fig. 2
Fig. 2 Data visualization of Naïve Bayes results Table III displays the pre-processing results.
Table IV displays the labeling results.
based on Table VIII, the C 4.5 algorithm excels with a 92% average accuracy rate, with the Kotlin, C, C++, C#, Java, and Dart programming languages being the top 5 with over 95% accuracy and for the Naïve Bayes algorithm getting a level of accuracy with an average of 85% with the Dart and Objective-C programming languages being the top 2 with 96% accuracy.

TABLE IX ALGORITHM
PROCESS RESULTS WITH 70% TRAINING 30% TESTING

TABLE XI ALGORITHM
PROCESS RESULTS WITH 60% TRAINING 40% TESTING

Table XIII ,
referring to Table V to Table XIII.

TABLE XIV DATA
RECAPITULATION OF PRECISION, RECALL, AND ACCURACY BASED ON

TABLE XVI ACCURACY
SCORE SCOPE GUIDANCE

TABLE XVIII COMPARISON
OF ACTUAL DATA WITH FORECASTING DATA