A Data Pipeline Concept for Digitizing Services in Small and Medium-Sized Companies

Akshay Chikhalkar - Institute Industrial IT (inIT), TH OWL, Lemgo, Germany
Marc Brünninghaus - System Technologies and Image Exploitation IOSB, Fraunhofer Institute of Optronics, Lemgo, Germany
Sahar Deppe - System Technologies and Image Exploitation IOSB, Fraunhofer Institute of Optronics, Lemgo, Germany
Eckard Bicker - delta3 GmbH, Lemgo, Germany
Carsten Röcker - Institute Industrial IT (inIT), TH OWL, Lemgo, Germany


Citation Format:



DOI: http://dx.doi.org/10.62527/joiv.9.1.3796

Abstract


Small and medium-sized enterprises face significant challenges in their digital transformation due to their limited resources compared to larger companies. In order to overcome these issues, this study proposes the idea of a data pipeline that is affordable and accessible for small and medium-sized enterprises. The suggested method conceptualizes an Extract, Transform and Load (ETL) procedure, which is a go-to approach for data engineering using open-source technologies. A case study of a mobile assistance system is used to illustrate this data flow and emphasizes its numerous advantages and practical uses. Small and medium-sized enterprises can use this data pipeline as a jumping-off point to create a cost-effective, efficient, and scalable data infrastructure. Because the pipeline’s components are modular and completely independent of one another, it is simple to expand, modify, or use individually to meet specific business needs. A basic dashboard prototype that can be modified for different applications is created to show the concept’s viability. Although pipeline design is provided by the concept, its successful execution necessitates technical know-how. To handle resource constraints and data anomalies, this research highlights the necessity of standardized procedures and careful tool selection. The data pipeline’s output may eventually be utilized for sophisticated analytical functions, giving small and medium-sized enterprises the competitive edge they need in the digital era by enabling them with data-driven solutions.

Keywords


Small and medium-sized enterprises (SMEs); data pipelines; extract; transform; load (ETL); open-source tools

Full Text:

PDF

References


A. Katsinis, J. L. González, L. Di Bella, L. Odenthal, M. Hell, and B. Lozar, “Annual report on European SMEs 2023/2024,” 2024. [Online]. Available: http://bit.ly/3ZmI46o. [Accessed: May 22, 2024].

“Zalando: Leveraging tech to build the next generation of e-commerce.” [Online]. Available: https://corporate.zalando.com/en/technology/leveraging-tech-build-next-generation-e-commerce. [Accessed: May 22, 2024].

J. Densmore, Data Pipelines Pocket Reference. O’Reilly Media, 2021.

HP Enterprise Development LP, “What is on-premises data centers vs. cloud computing?” [Online]. Available: https://www.hpe.com/us/en/what-is/on-premises-vs-cloud.html. [Accessed: Feb. 15, 2024].

Fortra LLC, “On-premises vs. cloud: What’s the difference?” [Online]. Available: https://www.alertlogic.com/blog/on-premises-vs-cloud-whats-the-difference/. [Accessed: Feb. 15, 2024].

T. A. Majchrzak, T. Jansen, and H. Kuchen, “Efficiency evaluation of open source ETL tools,” Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 287–294, Mar. 2011, doi:10.1145/1982185.1982251.

C. Ballard et al., Data Modeling Techniques for Data Warehousing. IBM Corp., 1998. [Online]. Available: https://eddyswork.synthasite.com/resources/Data%20Modeling%20Tech%20For%20Data%20Warehouseing.pdf.

D. Narandžić, T. Lolić, D. Stefanović, and S. Ristić, “The challenge of an extraction-transformation-loading tool selection,” in Proc. Int. Conf. Syst., Autom. Control, Meas. (SAUM), 2018, pp. 42–45. [Online]. Available: https://bit.ly/49ly7uJ.

K. Guttridge et al., “Magic quadrant for integration platform as a service,” Gartner, 2024. [Online]. Available: https://www.gartner.com/doc/reprints?id=1-2GMN4JZA&ct=240214&st=sb. [Accessed: Mar. 8, 2024].

N. Yuhanna, A. Katz, C. Provost, and J. Barton, “The Forrester Wave: Cloud Data Pipelines Q4 2023,” Forrester, 2023. [Online]. Available: https://www.forrester.com/bold. [Accessed: Mar. 8, 2024].

T. Cerquitelli et al., “Manufacturing as a data-driven practice: Methodologies, technologies, and tools,” Proc. IEEE, vol. 109, no. 4, pp. 399–422, Apr. 2021, doi: 10.1109/JPROC.2021.3056006.

C. K. Dehury, P. Jakovits, S. N. Srirama, G. Giotis, and G. Garg, “TOSCAdata: Modeling data pipeline applications in TOSCA,” J. Syst. Softw., vol. 186, p. 111164, Apr. 2022, doi:10.1016/j.jss.2021.111164.

I. Poloskei, “Data engineering case-study in digitalized manufacturing,” in 2021 IEEE 19th World Symp. Appl. Mach. Intell. Inform. (SAMI), Jan. 2021, pp. 000491–000494, doi:10.1109/SAMI50585.2021.9378691.

P. J. Goh et al., “Conceptual design of cloud-based data pipeline for smart factory,” in Intell. Manuf. Mechatronics, 2022, pp. 29–39, doi:10.1007/978-981-16-8954-3_4.

M. Brady and J. Loonam, “Exploring the use of entity-relationship diagramming as a technique to support grounded theory inquiry,” Qual. Res. Organ. Manag.: Int. J., vol. 5, no. 3, pp. 224–237, Nov. 2010, doi: 10.1108/17465641011089854.

S. Sharma and D. Thakkalapelli, “Comparative analysis of data storage solutions for responsive big data applications,” Eduzone Int. Peer Rev. Multidiscip. J., vol. 12, no. 2, pp. 244–250, 2023.

Z. Nebić and V. Mahnić, “Data warehouse for an e-learning platform,” in Proc. 33rd Int. Conf. Inf. Technol. Interfaces (ITI), vol. II, 2010, pp. 415–420.

S. Ponnusamy, “Evolution of Enterprise Data Warehouse: Past Trends and Future Prospects,” International Journal of Computer Trends and Technology, vol. 71, no. 9, pp. 1–6, Sep. 2023, doi:10.14445/22312803/ijctt-v71i9p101.

E. M. Leonard, Design and Implementation of an Enterprise Data Warehouse. Marquette University, 2011.

D. Loshin, Business Intelligence: The Savvy Manager’s Guide. Newnes, 2012.

S. Manikandan, “Data transformation,” J. Pharmacol. Pharmacother., vol. 1, no. 2, pp. 126–127, Dec. 2010, doi: 10.4103/0976-500X.72373.

G. M. F. Ahmed, M. S. Islam, and M. M. R. Karim, “Comparison between Inmon and Kimball methodology for the purpose of designing, constructing and testing of a commercial BIDW project,” Int. J. Comput. Graph., vol. 8, no. 1, pp. 11–20, May 2017, doi: 10.14257/ijcg.2017.8.1.02.

“Data warehouse concepts: Kimball vs. Inmon approach,” Astera, 2024. [Online]. Available: https://www.astera.com/type/blog/data-warehouse-concepts/. [Accessed: May 21, 2024].

D. Singh and C. K. Reddy, “A survey on platforms for big data analytics,” J. Big Data, vol. 2, no. 1, Oct. 2014, doi: 10.1186/s40537-014-0008-6.

“What is a medallion architecture?” Databricks, 2024. [Online]. Available: https://www.databricks.com/glossary/medallion-architecture. [Accessed: May 22, 2024].

A. Kumar and A. Aggarwal, “Lightweight cryptographic primitives for mobile ad hoc networks,” in Recent Trends in Computer Networks and Distributed Systems Security, 2012, pp. 240–251, doi:10.1007/978-3-642-34135-9_25.

“Airbyte — Open-source data integration platform — ELT tool,” Airbyte, 2024. [Online]. Available: https://airbyte.com/. [Accessed: Mar. 7, 2024].

“How to choose a data transformation tool,” dbt Labs, 2024. [Online]. Available: https://www.getdbt.com/blog/data-transformation-tool-choosing/. [Accessed: May 22, 2024].

A. Göransson and O. Wändesjö, “Evaluating ClickHouse as a big data processing solution for IoT telemetry,” Lund University, 2022. [Online]. Available: https://lup.lub.lu.se/luur/download/. [Accessed: May 22, 2024].

“What is ClickHouse?” ClickHouse, 2024. [Online]. Available: https://clickhouse.com/docs/en/about-clickhouse. [Accessed: May 22, 2024].

“ClickHouse pricing,” ClickHouse, 2024. [Online]. Available: https://clickhouse.com/pricing. [Accessed: May 22, 2024].

A. R. Munappy, J. Bosch, and H. H. Olsson, “Data Pipeline Management in Practice: Challenges and Opportunities,” Product-Focused Software Process Improvement, pp. 168–184, 2020, doi:10.1007/978-3-030-64148-1_11.

M. Tory, L. Bartram, B. Fiore-Gartland, and A. Crisan, “Finding Their Data Voice: Practices and Challenges of Dashboard Users,” IEEE Computer Graphics and Applications, vol. 43, no. 1, pp. 22–36, Jan. 2023, doi: 10.1109/mcg.2021.3136545.

V. S. Smith, “Data Dashboard as Evaluation and Research Communication Tool,” New Directions for Evaluation, vol. 2013, no. 140, pp. 21–45, Dec. 2013, doi: 10.1002/ev.20072.

A. Dhaouadi, K. Bousselmi, M. M. Gammoudi, S. Monnet, and S. Hammoudi, “Data Warehousing Process Modeling from Classical Approaches to New Trends: Main Features and Comparisons,” Data, vol. 7, no. 8, p. 113, Aug. 2022, doi: 10.3390/data7080113.

“Testing data pipelines: Overview, challenges & importance,” lakeFS, 2024. [Online]. Available: https://lakefs.io/blog/acceptance-testing-for-data-pipelines/. [Accessed: May 21, 2024].