Improving Data Reliability Assessment in ETL Processes through Quality Scoring Technique in Data Analytics
DOI: http://dx.doi.org/10.62527/joiv.8.4.3632
Abstract
Keywords
Full Text:
PDFReferences
L. Cai and Y. Zhu, “The Challenges of Data Quality and Data Quality Assessment in the Big Data Era,” Data Science Journal, vol. 14, no. 0, p. 2, May 2015, doi: 10.5334/dsj-2015-002.
S. Loetpipatwanich and P. Vichitthamaros, “Sakdas: A Python Package for Data Profiling and Data Quality Auditing,” 2020 1st International Conference on Big Data Analytics and Practices (IBDAP), pp. 1–4, Sep. 2020, doi: 10.1109/ibdap50342.2020.9245455.
I. El Alaoui, Y. Gahi, and R. Messoussi, “Big Data Quality Metrics for Sentiment Analysis Approaches,” Proceedings of the 2019 International Conference on Big Data Engineering, Jun. 2019, doi: 10.1145/3341620.3341629.
W. Elouataoui, I. El Alaoui, S. El Mendili, and Y. Gahi, “An Advanced Big Data Quality Framework Based on Weighted Metrics,” Big Data and Cognitive Computing, vol. 6, no. 4, p. 153, Dec. 2022, doi: 10.3390/bdcc6040153.
V. Azzolini et al., “The Data Quality Monitoring Software for the CMS experiment at the LHC: past, present and future,” EPJ Web of Conferences, vol. 214, p. 02003, 2019, doi: 10.1051/epjconf/201921402003.
S. F. Kristyanti, T. F. Kusumasari, and E. N. Alam, “Operational Dashboard Development as A Data Quality Monitoring Tools Using Data Deduplication Profiling Result,” 2020 6th International Conference on Science and Technology (ICST), pp. 1–6, Sep. 2020, doi: 10.1109/icst50505.2020.9732870.
E. Widad, E. Saida, and Y. Gahi, “Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis,” IEEE Access, vol. 11, pp. 103306–103318, 2023, doi: 10.1109/access.2023.3317354.
N. West, J. Gries, C. Brockmeier, J. C. Gobel, and J. Deuse, “Towards integrated Data Analysis Quality: Criteria for the application of Industrial Data Science,” 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), pp. 131–138, Aug. 2021, doi: 10.1109/iri51335.2021.00024.
A. Kohli and N. Gupta, “Big Data Analytics: An Overview,” 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 1–5, Sep. 2021, doi: 10.1109/icrito51393.2021.9596417.
Munawar, “Extract Transform Loading (ETL) Based Data Quality for Data Warehouse Development,” 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), pp. 373–378, Oct. 2021, doi: 10.1109/iccsai53272.2021.9609770.
B. Singhal and A. Aggarwal, “ETL, ELT and Reverse ETL: A business case Study,” 2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE), pp. 1–4, Dec. 2022, doi: 10.1109/icatiece56365.2022.10046997.
A. P. Pereira, B. P. Cardoso, and R. M. S. Laureano, “Business intelligence: Performance and sustainability measures in an ETL process,” 2018 13th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–7, Jun. 2018, doi: 10.23919/cisti.2018.8399473.
W. Han and M. Jochum, “A Machine Learning Approach for Data Quality Control of Earth Observation Data Management System,” IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, pp. 3101–3103, Sep. 2020, doi: 10.1109/igarss39084.2020.9323615.
A. Qaiser, M. U. Farooq, S. M. Nabeel Mustafa, and N. Abrar, “Comparative Analysis of ETL Tools in Big Data Analytics,” Pakistan Journal of Engineering and Technology, vol. 6, no. 1, pp. 7–12, Jan. 2023, doi: 10.51846/vol6iss1pp7-12.
R. Ji, H. Hou, G. Sheng, and X. Jiang, “Data Quality Assessment for Electrical Equipment Condition Monitoring,” 2022 9th International Conference on Condition Monitoring and Diagnosis (CMD), pp. 1–4, Nov. 2022, doi: 10.23919/cmd54214.2022.9991385.
M. Al Amin, MD. Jawad-Al-Mursalin Hoque, Z. Nazzum, M. A. Sayed, S. Tanveer Ahmed Rumee, and M. I. Zaber, “Data Quality Assessment of Substation Data in Bangladesh: Insights from Handwritten Data Digitization,” 2023 10th IEEE International Conference on Power Systems (ICPS), pp. 1–6, Dec. 2023, doi: 10.1109/icps60393.2023.10428984.
V. Pattana-Anake, F. J. J. Joseph, and P. Pachaivannan, “Data Wrangling for IoT Based Aquarium Water Quality Management System,” 2022 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), pp. 1–5, Dec. 2022, doi: 10.1109/icdsaai55433.2022.10028891.
X. Zuo, “Research on Data Quality Improvement Program Based on Big Data Application,” 2023 IEEE 3rd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), pp. 1742–1745, May 2023, doi: 10.1109/iciba56860.2023.10165495.
L. Davidson, "What is data quality and why does it matter?" Springboard, 2019. [Online]. Available: https://www.springboard.com/blog/data-analytics/data-quality/.
P. Zhang, F. Xiong, J. Gao, and J. Wang, “Data quality in big data processing: Issues, solutions and open problems,” 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1–7, Aug. 2017, doi: 10.1109/uic-atc.2017.8397554.
C. Batini, A. Rula, M. Scannapieco, and G. Viscusi, “From Data Quality to Big Data Quality,” Journal of Database Management, vol. 26, no. 1, pp. 60–82, Jan. 2015, doi: 10.4018/jdm.2015010103.
H. A. Sulistyo, T. F. Kusumasari, and E. N. Alam, “Implementation of Data Cleansing Null Method for Data Quality Management Dashboard using Pentaho Data Integration,” 2020 3rd International Conference on Information and Communications Technology (ICOIACT), pp. 12–16, Nov. 2020, doi: 10.1109/icoiact50329.2020.9332030.
H. Homayouni, “Testing Extract-Transform-Load Process in Data Warehouse Systems,” 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 158–161, Oct. 2018, doi: 10.1109/issrew.2018.000-6.
[24] R. Vaidyambath, J. Debattista, N. Srivatsa, and R. Brennan, "An intelligent linked data quality dashboard," in AICS 27th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, Galway, Ireland, 2019, pp. 5-6.
T. Samakit, C. Anutariya, and M. Buranarach, “QUALYST: Data Quality Assessment System for Thailand Open Government Data,” 2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 196–201, Jun. 2023, doi: 10.1109/jcsse58229.2023.10202060.
Talend, "Data quality Looker Block released for Talend Studio," Talend Community, Nov. 6, 2020. [Online]. Available: https://community.talend.com/s/article/Data-quality-Looker-Block-released-for-Talend-Studio?language=en_US.
Talend, "What is data health? Definition and how to measure," Talend, [Online]. Available: https://www.talend.com/resources/what-is-data-health/.
Ataccama, "Data quality management," Ataccama, [Online]. Available: https://www.ataccama.com/dictionary/data-quality-management.
J. Byabazaire, G. M. P. O’Hare, and D. T. Delaney, “End-to-End Data Quality Assessment Using Trust for Data Shared IoT Deployments,” IEEE Sensors Journal, vol. 22, no. 20, pp. 19995–20009, Oct. 2022, doi: 10.1109/jsen.2022.3203853.
S. McCarthy, A. McCarren, and M. Roantree, “A Method for Automated Transformation and Validation of Online Datasets,” 2019 IEEE 23rd International Enterprise Distributed Object Computing Conference (EDOC), pp. 183–189, Oct. 2019, doi: 10.1109/edoc.2019.00030.
R. Likert, "A technique for the measurement of attitudes," Archives of Psychology, vol. 22, no. 140, pp. 55, 1932.