ON INFORMATICS

— Cryptocurrency investment is an investment instrument with high risk and greater advantage than other investment instruments. To make a big profit, investors need to analyze cryptocurrency investments to predict the purchase price. The highly volatile movement of cryptocurrency prices makes it difficult for investors to predict those prices. Data mining is the process of extracting large amounts of information from data by collecting, using data, the history of data relationship patterns, and relationships in large data sets. Support Vector Regression has the advantage of making accurate cryptocurrency price predictions and can overcome the problem of overfitting by itself. Polkadot is one of the cryptocurrencies that are often used as investment instruments in the world of cryptocurrencies. Polkadot cryptocurrency price prediction analysis using the Support Vector Regression algorithm has a good predictive accuracy value, including for Polkadot daily closing price data, namely with a radial basis function (RBF) kernel with cost parameters C = 1000 and gamma = 0.001 obtained model accuracy of 90.00% and MAPE of 5.28 while for linear kernels with parameters C = 10 obtained an accuracy of 87.68% with a MAPE value of 6.10. It can be concluded that through parameter tuning, the model formed has an accuracy value, and the best MAPE is to use a radial kernel basis function (RBF) with cost parameters C = 1000 and gamma = 0.001. The results show that the Support Vector Regression method is quite good if used to predict Polkadot cryptocurrencies.


I. INTRODUCTION
Cryptocurrency (cryptocurrency) is a digital or virtual currency designed as a medium of exchange [1]. Cryptocurrency comes from cryptography, which means secret code, and currency, which means currency. In other words, cryptocurrency is a virtual currency that is protected by a secret code. Cryptography is a method used to protect information and communication channels through the use of code. The use of cryptography makes the use of cryptocurrencies cannot be manipulated, and that is, cryptocurrency transactions cannot be falsified [2].
The recording of cryptocurrencies or cryptocurrencies is usually centralized in a blockchain system. Blockchain technology can, also called Distributed Ledger Technology (DLT), is a concept in which every participant or party who is a member of a distributed network has access rights to bookkeeping [3]. One cryptocurrency that is much in demand by investors or traders is polkadot. According to the coinmarketcap.com website, Polkadot is ranked in the top 10 of the world's coin market cap. With the number of coin market cap as much as $ 24,120,241,891 in 2022.
This shows that polka dot is one of the best cryptocurrencies. Polkadot is a multichain network with a shard, meaning it can process many transactions on multiple chains in parallel ("para chain"). These parallel processing capabilities increase scalability. Polkadot was founded by the Web3 Foundation, a Swiss institution founded to facilitate a fully functional and user-friendly decentralized web, as an open-source project [4].
Crypto investing is a type of investment that offers a high return. Of course, followed by a high level of risk. Therefore, the provision of the right information is very useful in planning or designing a mature strategy to make decisions for every individual and business person in reducing risk and taking advantage [5]. One way that can be used is to predict the price of polka dots accurately. In making Polkadot price predictions, a machine learning method is needed to obtain the prediction results close to actual data.
One of the algorithms in machine learning that can be used to perform Polkadot price prediction is the support vector regression algorithm. Support Vector Regression (SVR) is the development of a regression model from Support Vector Machines (SVM) that was originally used to describe classification problems [6]. In this study, SVR was applied for the Time Series data type, which in the context of data has the meaning of data in the form of a series of events or observations taken sequentially over time.
SVR has been widely used for stock price forecasting and shows better performance than other algorithms, including ANN, where ANN has been widely used for forecasting processes including as a promising alternative for predicting stock prices, where ANN finds a solution in the form of a local optimal while SVR finds a globally optimal solution (Santosa, 2007). Therefore, based on the description above, in this study, the method used is Support Vector Regression (SVR) is used to predict the closing price of Polkadot.

A. Cryptocurrency
Simply put, cryptocurrency can be interpreted as a digital currency. Cryptocurrency is a method of creating virtual "coins" and providing them and securing ownership and transactions using cryptographic technology. Cryptography is simply a technique of protecting information by transforming it (e.g., encrypting information) into a format that cannot be read and can only be described by someone who has a secret key. Cryptocurrency has a decentralized nature which means transactions are conducted peer-to-peer from sender to recipient in the absence of intermediaries. Some well-known cryptocurrencies in Indonesia such as Bitcoin, Ethereum, Litecoin, Dash, Ripple, Bitcoin Cash, Bitcoin Gold, Zcash, Monero, Maker, Byteball, and others. Cryptocurrency is different from currencies in general because the transaction model commonly used by the general public is centralized [7].
Bitcoin is a technology developed as a payment medium as time goes on; more and more Bitcoin users make the Bitcoin exchange rate higher from time to time, so now Bitcoin is considered a digital asset or investment instrument. Every Bitcoin transaction is stored in a sealed block using a specific code based on scientific cryptography [8]. Bitcoin is one of the cryptocurrencies where a very secure cryptographic technique guarantees transactions. Bitcoin is a cryptocurrency that first appeared, and which is still popular today. The name Bitcoin becomes synonymous with that Blockchain alone. The presence of cryptocurrencies is the answer to the transaction needs of the digital times now that is easy, fast, transparent, and acceptable to both parties make transactions [9].
Centralized nature is exemplified in the transaction model that the community has generally used. For example, when someone wants to send some money to someone else, all he does is use banking services (ATM, Mobile Banking, or come directly to the relevant bank) and then transfer some money to the person's account number. The transaction is done through a bank intermediary and a trusted service. So, the process of money that is transferred goes to the bank first, then passed on to the recipient. The process is real-time, so the move does not feel long. However, because the process is through an intermediary, there is a reward to be paid, namely administrative costs [10].
While the decentralized nature means that no one mediates or third parties become intermediaries, transactions are conducted peer-to-peer from sender to receiver. All transactions recorded in computers on the network worldwide are called miners (miners who help secure and record transactions on the network). Miners earn commissions with the virtual money used, but not everyone can become a miner, as it takes special expertise with complex computational processing to solve the cryptography used. This is one of the reasons cryptocurrency miners generally use high-spec and specialized computers. This decentralized nature is the DNA of the Blockchain system., Blockchain becomes a platform that allows digital cryptocurrencies to be used to transact [11]. Polkadot is an open-source sharding multichain protocol that facilitates the delivery of a cross-chain data or any type of asset, not just tokens, thus allowing a wide variety of blockchains to operate with each other. Interoperability seeks to build a fully decentralized and private web controlled by its users and simplify the creation of new applications, institutions, and services. The Polkadot protocol connects public and private chains, unauthorized networks, oracles, and future technologies, thus enabling the standalone blockchain to share information and transactions without trust through Polkadot's 'relay chain'. Polkadot's native DOT token has three clear objectives: taking care of network governance and operations and creating a parallel chain through 'bonds' [12].

B. Machine Learning
Machine learning is artificial intelligence or commonly called artificial intelligence (AI). The working system in artificial intelligence is made to resemble a working system in the human brain using computer algorithms. Machine learning is the ability of a computer to perform learning without having to explain or programmatically explicitly. A type of artificial intelligence that provides computers with the ability to learn from data without explicitly having to follow programmatic instructions. The characteristic of machine learning is the process of training and learning, so it requires data to be studied, which can be called training data and data to be tested or data testing [13].
In general, there are two types of machine learning. Supervised learning has input variables and output variables and uses one or more algorithms to study the function of mapping input variables to output variables. The result of supervised learning is to estimate the mapping function so that if there is a new input, it can predict the output for the input. Unsupervised learning is a type of learning that only has input data or inputs but no output of related variables. The result of Unsupervised Learning is to model the basic structure in the data to study the data even further [14]. Prediction is a process of systematically estimating something that is most likely to occur in the future based on available past and present information owned so that the error (difference between something that happened and the estimated result) can be minimized [15].

C. Support Vector Regression
Data mining is an analytic process designed to examine large amounts of data searching for valuable and socially hidden knowledge [16]. Data mining aims to look for desired trends or patterns in large databases to assist in decisionmaking in the future [17]. The data mining process consists of several stages: data selection, data cleaning, data transformation, use of data mining methods, and evaluation of patterns found. Data mining is divided into five methods based on functionality: estimation, prediction, classification, clustering, and association [18]. Support Vector Regression (SVR) is the development of a regression model of Support Vector Machines (SVM) that was originally used to describe classification problems. In this study, SVR was applied for the Time Series data type, which in the context of data has the meaning of data in the form of a series of events or observations taken sequentially over time. The goal of the SVR is to create more random data to be able to accept regression by mapping it at a higher dimension [19]. The general equation of regression can be seen as follows: where ω is the weight and b is the coefficient, φ(x) is the x feature mapping function at higher dimensions. This algorithm consists of several stages, among others:

1)
Initialization parameters: In the SVR method uses several parameters, namely ε and C, which are influential in determining fault tolerance, CLR as a determinant of the speed of the learning process, σ as a constant that affects the distribution of data dimensions, and λ as a determinant of the scale of the SVR kernel mapping dimension [20].

2)
Hessian Matrix Calculation: The Hessian matrix is calculated according to the following equation: [R]ij = K(xi , xj) + λ 2 , for i and j=1,2,…,n Information: The following kernel functions are used to map the data dimensions to be higher, so it is expected to produce higher and structured data dimensions. The Gaussian Kernel (RBF) implementation was widely used in previous research and was considered capable of delivering good results in SVR. It is functions are defined as follows: Information: x dan xi = Data σ = Dimension Constant σ (sigma) as a constant dimensional need to be defined at the beginning so that the training data results do not look very accurate when the value is too small or too inflexible for complex calculations when the value is too large.
Error Value Calculation. Calculation of changes in Lagrange multiplier values and their changes. For the first step, it is necessary to initialize the Lagrange multiplier value of αi and αi * of 0. Moreover, after that, the next sequence of steps is as follows: Calculation of error values (error) αi * ′ = δαi * + αi * αi ′ = δαi + αi The above stage is repeated for each training data.

3)
Iteration process: The calculation stage of the above error value is then repeated (iteration) until one of these conditions is met. Iteration reaches the maximum iteration limit that has been determined. There is no change in value in the Lagrange multiplier or, in other words, convergence. The change in value has been met the Lagrange multiplier change value requirement no more than the epsilon constant (max(|δαi |) < ε and max(|δαi * |) < ε.

4)
Calculation of Forecasting Results: The number of forecasting results is obtained after calculations from regression equations formulated such as equations:  [21]. The Mean Absolute Percentage Error (MAPE) value can be calculated using the following equation: Information: Xt = Actual value Yt = Prediction value n = amount of data

E. Research Approach
The analysis method in this study aims to find out an overview of the closing price data of the Cryptocurrency Polkadot. In predicting the closing price of this Polkadot cryptocurrency using Support Vector Regression (SVR) analysis. Researchers used RBF kernels with cost parameters C=10, 100, 1000 and gamma=0.1, 0.01, 0.001, 0.0001, linear kernels with cost parameters C=10, 100, 1000.The tools used in this study are using the help of python 3.9 programming language and Libre Office.
The stages of data analysis that researchers carried out is described through a flowchart as follows:

Fig. 2 Flowchart Analysis Support Vector Regression
Analytical steps:  Prepare daily data Polkadot in the period August 20, 2020, to December 31, 2021, downloaded from yahoo finance.  Perform descriptive analysis on Polkadot daily data to find out an overview of Polkadot daily data.  Preprocessing data includes defining dependent variables (Y) and independent variables (X). Then perform transformations on independent and dependent variables.  Divide data into two, namely data training and data testing.  Determine the kernel to be used and determine the cost (C) parameters, and gamma to perform Support Vector Regression analysis.  Perform Support Vector Regression analysis by first determining the parameters and kernels determined by the study of literature.  Tuning parameters to get optimal accuracy and minimal error.  Postprocessing is by denormalizing data to predict data.  Predicting Polkadot price data in the future.  Interpret the Support Vector Regression analysis results that have obtained the best parameters and kernel.

F. Data Collection
The data used in this study is secondary data. The data was obtained from several websites, namely from www.coinmarketcap.com to obtain Polkadot blockchain information data, then from the www.finance.yahoo.com website to obtain Polkadot daily price data. The period of Polkadot data collection used in this study is the daily price of Polkadot period August 20, 2020, to December 31, 2021, with the amount of data as many as 499 records. The variable used in this study is the closing price (Close) polkadot. The following is a table containing research variables and an explanation of the operational definitions of their variables:

A. Preprocessing Data
Data preprocessing is done to clean the data so that raw data is more easily received by support vector regression (SVR) algorithms. In this study, the preprocessing stage of data is to determine the input and output of dependent and independent variables, data normalization, and sharing training data and testing data.

1) Variable Determination:
The data used for this study is Polkadot daily closing data in the period August 20, 2020, to December 31, 2021, consisting of 499 data. This study applies the type of learning that is supervised learning. Supervised learning requires input variables and output variables to be studied using algorithms. The input data used in this study is the daily closing price of Polkadot one previous period which was used to predict the price of Polkadot one day later. This problem is assumed that the price of Polkadot today is influenced by the price of Polkadot in one previous period. 2) Data Normalization: At this stage, the input and output data were normalized to the range 0 -1 using the help of minmax normalization modules. 3) Data Training dan Data Testing: The data were divided into training data and data testing to perform Support Vector Regression analysis. Data training sharing is done to improve the performance of Support Vector Regression to data testing in determining the best parameters for model formation. Data sharing can be seen in the following Table:   TABLE IV  SHARING DATA TRAINING AND DATA TESTING   Information  Data  Total  Training  Testing  Amount of Data  399  100  499  Percentage  80%  20%  100% Based on Table 4, the data sharing used in this study is 80% of the total data used as training/training data and the remaining 20% of the total data as data testing/trials. The amount of training data has a greater percentage because machine learning is better trained to learn the model. This is done so that machine learning informing models and models that are formed are trained using data testing to provide forecasting for more optimal data testing. The sharing of training/training data as well as data testing/trials are done randomly/randomly using the python programming language. Furthermore, data training was trained with the Support Vector Regression method so that a model is formed with a combination of parameters used, then data testing to test the results of the model formed from training data training.

B. Support Vector Regression Analysis
In theory, the Support Vector Regression method or SVR is an adaptation of the machine learning theory previously used for classification problems, namely, Support Vector Machine SVM. The Support Vector Regression method applies the support vector machine method for regression cases. For support, vector regression modeling is the same as support vector machine, which determines the optimal hyperplane through parameters to form a model. The concept of Support Vector Machine classifies support vectors into two classes, unlike the case with Support Vector Regression which determines parameters to form a model so that support vectors enter the hyperplane area to form an optimal regression model. The parameters used to form the model in this study are linear kernel parameters and radial basis functions.
Furthermore, the focus of this study is on linear kernel parameters, radial base function with C parameters of 10,100,1000 as a support vector tolerance number to the hyperplane, then gamma parameters for radial kernel base function of 0.1, 0.01, 0.001, 0.0001. The model's performance is measured using the accuracy values of R-square and MAPE, the more R-square values approach the number 1 (one), the better the model, but the model should not be overfitting or underfitting. Overfitting is where the data used for training is the best so that if tested with different data can reduce the accuracy produced, while underfitting is a state where the training model is data that does not represent data. The entire data to be used to cause poor performance on the model.

C. Evaluation of Support Vector Regression Model
Evaluation of the model on Polkadot daily data is displayed the accuracy values of R-square and MAPE from each kernel used. The table-based Table compares the accuracy values of each kernel. Obtained accuracy value in test data or testing, namely linear kernel with cost parameter value C = 10 obtained an accuracy of 87.68% and MAPE value of 6.10 while RBF kernel with cost parameter value C = 10 and Gamma = 0.1 obtained accuracy value of 80.33% and MAPE value of 7.05. The linear kernel is between the two kernels with a fairly high accuracy value. Due to the formation of the optimal model and has a fairly low MAPE value. Furthermore, the researchers tuned parameters using the Grid Search algorithm method to improve the model's performance for the better. The following Table 6 shows the accuracy value after tuning the parameters. Tuning is the process of determining the parameters to get the best model. The Table shows each kernel's accuracy and MAPE values after tuning parameters. It was found that the accuracy and MAPE values of each kernel for data testing after tuning parameters and 10-cross validation there was an increase in model performance with R-square values of 87.68% for linear kernels and 90.00% for RBF kernels and MAPE values for each kernel of 6.10 and 5.28 so that the best models obtained were with the RBF kernel, It can be concluded that through tuning parameters, the model formed has the best accuracy value and MAPE. The optimal parameter formed to predict the daily price of Polkadot from the results of tuning parameters is for linear kernels with parameters C = 10, and RBF kernels with optimal parameters C = 1000 and gamma = 0.001. The following Figure 4 displays the plot graph of the RBF kernel with the best performance for comparison of actual data on Polkadot daily closing prices with training and testing data. In the image is the plot of actual data and predictions. The X(Date) axis is the order of actual and predictive data periods. The Y-axis (Polkadot price) is polka-dot's daily closing price on actual and predictive data. The actual data on the plot is a blue line, while the training prediction data is an orange line, and the testing prediction data is a green line. From the resulting plot that with the Support Vector Regression method kernel radial basis function (RBF), the resulting prediction data plot huddled following the actual data plot that shows the results of Polkadot daily closing price predictions are not much different from the actual daily closing price.

D. Polkadot Daily Closing Price Prediction
From the results of the experiment of parameters to form a Support Vector Regression (SVR) model that has been done, the next stage is to predict the daily closing price of Polkadot using the best model that has been formed before. The model generated from Polkadot daily price closing data shows good performance when viewed from the actual data line graph, and prediction shows the plot of prediction data following the actual data plot, which means the prediction data is not much different from the actual data. The following Table 7 shows prediction data and actual data on the daily closing price of Polkadot on the testing data using the best model that has been determined. The Table 8 is Polkadot's daily closing price forecasting shows that daily closing price forecasting results use the Support Vector Regression model with RBF kernel with parameters C = 1000 and Gamma = 0.001 for the next 10 periods. From the forecast results, Polkadot daily closing price experienced sideways in the number 26 to 28 US $ per coin.

IV. CONCLUSION
Based on the results of the research that has been described in the previous chapter, it can be concluded that: The results of the descriptive analysis that has been carried out can be seen in the period from August 20, 2020, to December 31, 2021, Polkadot's daily closing price movements fluctuate. The Support Vector Regression (SVR) method can be applied to predict the daily closing price of Polkadot. The Support Vector Regression (SVR) model obtained for Polkadot's daily closing price data, namely the Support Vector Regression (SVR) with a radial basis function (RBF) kernel with cost parameters C = 1000 and gamma = 0.001 obtained a model accuracy of 90.00% and MAPE of 5.28 while for the linear kernel with parameter C = 10, the accuracy result is 87.68% with MAPE value of 6.10. It can be concluded that through parameter tuning, the model formed has an accuracy value, and the best MAPE is to use a radial basis function (RBF) kernel with cost parameters C = 1000 and gamma = 0.001. The results of Polkadot's daily closing price forecast for the next 10 periods tend to experience sideways in the range of 26 to 28 US$ per coin.