The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).
If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.
Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.
BibTex Citation Data :
@article{JOIV928, author = {Ilma Nur Hidayati and Tien Fabrianti Kusumasari and Faqih Hamami}, title = {Comparison of Apache SparkSQL and Oracle Performance: Case Study of Data Cleansing Process}, journal = {JOIV : International Journal on Informatics Visualization}, volume = {6}, number = {1-2}, year = {2022}, keywords = {Spark; Oracle; cleansing; processing time; comparison.}, abstract = {A dataset with good quality is a valuable asset for a company. The data can be processed into information to help companies improve decision-making. However, the data increased more and more over time to decrease data quality. Thus, good data management is important to keep data quality meeting company standards. One of the efforts that can be done is conducting data cleansing to clean data from errors, inaccuracies, duplication, format discrepancies, etc. Apache Spark is an engine that can analyze large amounts of data. Oracle Database is a database management system used to manage databases. Both have their own reliability and can be used to analyze SQL-shaped data. This study compared Spark and Oracle performance based on query processing time. Both were tested on queries used to perform data cleansing of millions of rows of the dataset. The research focuses on finding out Spark and Oracle's performance through quantitative analysis. The results of this study showed that there were differences in query processing times on both tools. Apache Spark is rated better because it has a relatively faster query processing time than Oracle Database. It can be concluded that Oracle is more reliable in storing complex data models than in analyzing large data. For future research, it is suggested to add other comparison aspects such as memory and CPU usage. The researchers can also consider using query optimization techniques to enrich query experiments.}, issn = {2549-9904}, pages = {208--213}, doi = {10.30630/joiv.6.1-2.928}, url = {https://joiv.org/index.php/joiv/article/view/928} }
Refworks Citation Data :
@article{{JOIV}{928}, author = {Hidayati, I., Kusumasari, T., Hamami, F.}, title = {Comparison of Apache SparkSQL and Oracle Performance: Case Study of Data Cleansing Process}, journal = {JOIV : International Journal on Informatics Visualization}, volume = {6}, number = {1-2}, year = {2022}, doi = {10.30630/joiv.6.1-2.928}, url = {} }Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
__________________________________________________________________________
JOIV : International Journal on Informatics Visualization
ISSN 2549-9610 (print) | 2549-9904 (online)
Organized by Department of Information Technology - Politeknik Negeri Padang, and Institute of Visual Informatics - UKM and Soft Computing and Data Mining Centre - UTHM
W : http://joiv.org
E : joiv@pnp.ac.id, hidra@pnp.ac.id, rahmat@pnp.ac.id
View JOIV Stats
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.