Implementation of Big Data Information System Using Open-Source Metabase for Civil Registration and Vital Statistics Data Visualization in Surabaya

— Civil registration involves the mandatory and continuous documentation of important life events of a country's population under local legal requirements. In many countries, these documents are required to access government services such as education, healthcare, social services, formal employment, insurance benefits, and inheritance rights. Indonesia should prioritize building a comprehensive Civil Registration and Vital Statistics (CRVS) using a big data information system to ensure every individual has a legal identity, can access government services, and collect accurate and reliable statistics on vital events through geospatial maps. Surabaya, a city in Indonesia, still needs a comprehensive Civil Registration and Vital Statistics (CRVS) system. We produce many informative visualizations from the query and modeling processes in Metabase. Based on the PIECES framework, this application's importance level is 4.56 or 91.25%, meaning the application is important, and the satisfaction level is 4.29 or 85.76%, meaning the application is satisfied for the respondents. This research provides a brief overview of how Metabase works and how it can be used to generate visualizations of job-type data. It demonstrates the ease with which visualizations can be changed and customized. It had a good affordability point, making its implementation easier and more beneficial. It also emphasizes the importance of having a powerful tool like Metabase for data analysis and decision-making, especially for the dispendukcapil as a civil registration agency.


I. INTRODUCTION
Civil registration involves the mandatory and continuous documentation of significant life events of a country's population under local legal requirements.The United Nations suggests recording ten vital events such as births, deaths, marriages, divorces, adoptions, and legitimations.The information collected from civil registration forms vital statistics, including data on the events and the individuals involved [1]- [17].Apart from civil registration, vital statistics can be obtained from population surveys and censuses.
The CRVS system is an essential responsibility of the national government, and it can be divided into three main functions: legal, administrative, and statistical.Its primary purpose is to establish an individual's legal identity, often demonstrated through legal documents like birth certificates.In many countries, these documents are required to access government services such as education, healthcare, social services, formal employment, insurance benefits, and inheritance rights [18]- [23].
Indonesia should prioritize building a comprehensive Civil Registration and Vital Statistics (CRVS) system to ensure that every individual has a legal identity and can access government services and to collect accurate and reliable statistics on vital events [24]- [28].Especially in Surabaya, a city in Indonesia, spatial data analytics were essential to start supporting smart city schemes in Surabaya.Because Surabaya does not yet have a comprehensive Civil Registration and Vital Statistics (CRVS) system, primarily to analyze the spatial data, making the right decisions based on evidence as soon as possible is hard.Metabase, an opensource application, can facilitate data exploration, visualization, and sharing across various databases and platforms.In this paper, we propose the utilization of Metabase to support the civil registration and vital statistics system in Surabaya's population information system.The study also includes the visualization of CRVS-related data using the open-source Metabase application.So, this research's spatial map visualization with Metabase will strengthen Surabaya's population information system.
This journal paper follows a structured framework that includes an introduction in Section 1, a review of related works in Section 2, the design of the system in Section 3, an experimental analysis in Section 4, and concludes with a summary and conclusions.

II. MATERIALS AND METHOD
Suthar et al. [1] systematically reviewed and synthesized policies implemented in 25 countries to strengthen their civil registration and vital statistics systems.The study evaluates various approaches and strategies to improve CRVS systems, including legal and regulatory frameworks, technological innovations, and capacity-building programs.The paper highlights the importance of collaboration between stakeholders, community engagement, and the need for sustained political commitment to achieve effective CRVS systems.
Mills et al. [2] provide an overview of civil registration and vital statistics (CRVS) systems and their applications in lowand middle-income countries.The study discusses the importance of CRVS systems for legal identity, public health, and social welfare and identifies challenges faced by many LMICs in establishing and maintaining effective CRVS systems.The paper also highlights the potential for technological innovations and collaborative efforts between stakeholders to improve CRVS systems and enhance their impact.
Wahyuni et al. [3] examine the registration policy for interfaith marriages involving Indonesian citizens who marry overseas.The study analyses the legal requirements and procedures for registering these marriages, including the documentation needed to prove the marriage's validity and the steps required to obtain legal recognition in Indonesia.The paper highlights the importance of adhering to the registration policy to ensure these marriages are recognized and recorded in Indonesia's civil registration and vital statistics system.
Usman et al. [4] discuss Indonesia's Sample Registration System (SRS) 2018 and identify its strengths and weaknesses.The study finds that the SRS can be improved by increasing its sample size and developing a comprehensive national sampling frame.The paper suggests that the SRS is a work in progress and requires ongoing evaluation and improvement.
Al Hasri et al. [5] discuss developing a web-based information system for population administration services in Banaran Village.The study describes the system's design and implementation, including features such as birth registration, death registration, marriage registration, and identity card management.The paper highlights the system's benefits in improving the accuracy and efficiency of population administration services, particularly in rural areas.
Putri et al. [6] evaluate the effectiveness of the E-Lampid system as a public service innovation in the field of population administration in Surabaya City.The study analyzes the system's features and benefits, including online application and processing of population administration documents, realtime data management, and improved user access and transparency.The paper also highlights the challenges faced in implementing the system, including the need for technical support and user education.Overall, the study suggests that E-Lampid is a promising innovation in public service delivery and can potentially enhance the efficiency and effectiveness of population administration services in Surabaya.
Santos et al. [7] compare and evaluate two open-source business intelligence tools, Metabase and Redash.The study analyzes the features and capabilities of both tools, including data visualization, dashboard creation, and data source connectivity.The paper also discusses the advantages and limitations of each tool and its potential applications in various industries and settings.The study suggests that Metabase and Redash are viable options for organizations seeking open-source business intelligence solutions.Each tool offers unique features and benefits depending on specific needs and requirements.

A. Research Method
Several things have been done in this research.The overview of the system is included in the system design, and another element is the technical aspect of realizing the system.

B. System Design
This study has two main sections: the civil registration agency of Surabaya section, which provides the primary data, and the university section, which processes the spatial data.As shown in Fig. 1, a Metabase visualization in the university section has various features such as a data importer, geoJSON, SQL editor, and graph visualizer.

C. Big Data Information System and Analytics
The study aims to identify the factors that influence the implementation and utilization of Big Data Information systems for managing the population data of Surabaya.The study also examines the challenges, benefits, and opportunities that emerge from integrating Big Data technologies into managing civil population data in Surabaya.
The determinants of big data analytics about civil population data in Surabaya refer to the factors that influence the use of big data analytics to analyze data related to the population of Surabaya city.These determinants include factors such as the availability and accessibility of data, technological infrastructure, analytical skills, and data privacy and security.
In simpler terms, the determinants of big data analytics about civil population data in Surabaya are the factors that affect how and why data related to the population of Surabaya is analyzed using big data analytics.These factors can range from the quality and quantity of available data to the technological tools and expertise required to analyze the data effectively.Other important considerations include privacy and security concerns that must be addressed to ensure that the use of data is ethical and legally compliant.
Overall, the determinants of big data analytics about civil population data in Surabaya are crucial in shaping how data is collected, analyzed, and used to inform decision-making processes.Understanding these determinants is vital to using big data analytics effectively to improve the quality of life for Surabaya's residents and drive positive social and economic outcomes for the city.

D. Improvement of Big Data Analytics Layout
Big data analytics can be applied to civil population data in Surabaya in different ways.This condition can include using data to understand population demographics, predict healthcare needs, improve transportation systems, and more.By analyzing large amounts of data, patterns, and trends can be identified to inform decisions and policies to benefit the community.In simpler terms, big data analytics can help Surabaya's government and businesses better understand the city's population and needs, leading to improvements in various areas such as healthcare, transportation, and urban planning.Internal and external factors impact big data analytics in Surabaya.
These factors include the availability of technology, access to reliable and accurate data, and the expertise of data analysts.For example, if the technology infrastructure in Surabaya needs to be updated or improved, it can limit the ability to collect and analyze data.Additionally, if the data is complete or accurate, it can affect the accuracy of the insights gained from the analysis.The steps can be taken to improve the effectiveness and efficiency of extensive data systems in Indonesia.
These improvements can include system design changes, such as improving data quality, enhancing data security and privacy, and optimizing data storage and processing.One potential solution is to use open-source big data metabase technology, which allows for managing and analyzing large amounts of data cost-effectively and scalable.Open-source technology is freely available and can be customized to meet the specific needs of a business or organization.Additionally, it can be used to integrate data from various sources, making it easier to analyze and gain insights from large amounts of data.

E. Metabase
Metabase's data importer feature allows it to collect data from various databases, including PostgreSQL.The data is then queried for multiple information such as birth date, occupation data, religion distribution, gender distribution, education level distribution, marital status, Indonesian and foreign citizen distribution, and productive and nonproductive age distribution of Surabaya's population.
Additionally, geospatial data for Surabaya has been developed and categorized by province, city, district, and village.The output of the Metabase system includes geospatial maps that display the data distribution based on the predefined categories.

F. Query
The SQL query retrieves the total number of records from the citizen main table and joins it with the sub-district code table based on several conditions.The query also includes a filter to retrieve data between the specified date range and a specific age range.The result is grouped by the sub-district name and presented with the total count.

G. Geo JSON
In this paper, we utilize Metabase to visualize data using maps.The maps of the Surabaya region used throughout were created using the custom map in geoJSON format.The geoJSON data of Surabaya city used in this project includes districts and sub-districts.When combined with data from the civil registration office, it can display geospatial information.

H. Civil Registration Agency of Surabaya
Civil registration agency of Surabaya has a population information system called KLAMPID.The KLAMPID database system uses PostgreSQL, which has a table in Fig. 2. While KLAMPID has visualization features, it does not support geospatial visualization yet.The database was replicated to the university cloud to visualize data in a geospatial format.The university cloud has a virtual machine for Metabase.With Metabase, geospatial data from the replicated data is displayed.
The civil data here covers about 2 million detailed data of each citizen.Each data shows their identity, education, job, religion, and marital status.The data can be aggregated to get new insight into the social phenomenon.

III. RESULTS AND DISCUSSION
The subsequent sections comprehensively explain each application, including the datasets used, software development specifics, and features.Whenever the background or literature differs from what was presented in the first two sections, it is presented briefly.

A. Environment
Experiments were conducted using two virtual machines, a virtual machine for PostgreSQL and a virtual machine for Metabase.Detailed specifications for each virtual machine are listed in Table I.

B. Data Visualization
There are several visualization experiments conducted.The visualization is done based on the available database in the postgreSQL virtual machine.Some visualization aspects are Livebirths, Marriage and Divorce, and Employment.

C. Livebirths
The visualization in Fig. 3 displays data on the number of births in each neighborhood.The data can be queried for a specific period and several years in the past.The data is calculated and visualized geospatially.Fig. 3 shows that the number of births is categorized by neighbourhood from January 1, 2018, until December 31, 2022.

D. Marriage and Divorce
Metabase also can display marriage and divorce data.The data is retrieved by querying the date of birth and death certificates.The visualized data can be modified by changing the start and end dates.The data displayed is categorized by districts.Fig. 4 shows the visualization of marriage data in Surabaya, while Fig. 5 shows the visualization of divorce data in Surabaya.

E. Employment
There are 89 job types in the database, and using Metabase, it is easy to change the type of visualization.The available visualization types include line, bar, combo, area, row, waterfall, scatter, pie, table, and map.To change the type of visualization, select from the menu on the left-hand side.Fig. 6 shows the menu of available visualization options.Also, Fig. 7 shows the visualization in bar mode for each area.

F. PIECES Framework
PIECES Framework is a framework for user acceptance tests consisting of performance, information and data, economics, control and security, efficiency, and service.The aspect list of each parameter is listed in Table II   The application is easy for users to use.

2.
Tables and graphs can be easily accessed.

3.
Applications can easily select the desired data source.

4.
Apps can easily combine data sources.

5.
Applications can easily perform data aggregation quickly.
The application can visualize many display modes smoothly.8.
Applications can respond to several jobs quickly.9.
The time required to respond to changes from the user is relatively fast.Data can be stored and processed by the application.

2.
Applications can display total data.

3.
Applications can present data aggregation.

4.
Applications can present segmented data.

5.
Various types of visualization can be used as needed.

Index Aspect 6.
Applications can display multiple charts at once in one view.7.
Information that can be accessed according to the role level of each user.8.
The information displayed by the application is beneficial.9.
The information presented by the application can be easily learned and understood.

Index Aspect 1.
Applications can facilitate decision-making to save the budget for field survey needs.

2.
Applications do not require high specifications to see the data.

3.
The application does not require additional devices to operate the application.

4.
The application can display strategic data with high economic value for regional policymakers.

Index Aspect 1.
The application's authentication system (user login) is running well.The database control is separate from the data graphic viewer application so that the data is safer.

3.
There are data modification access restrictions for the observer role.4.
The method for logging in to an account is easy to understand. 5.
Centralized database management.Applications can improve the efficiency of regular data checking.

2.
Data parameters can be easily changed according to needs for a more specific data description.
Applications can increase the efficiency of accessing data anytime and anywhere.5.
Setting user accounts and access rights improves efficiency when many authorized parties.The application is easy to learn and understand.

2.
Applications can make it easier for users to access the specific data parameters needed.

3.
Applications can present an attractive and concise appearance.

4.
Apps can display many types of visualizations.

5.
Applications can perform various data processing.
There are 38 questions for satisfaction and 38 questions for the importance aspects.Then the results are processed using a formula that is used to get the average value of both importance and satisfaction rating:  Table VIII shows the range of values specified as a reference in this valuation method.There are five categories of conclusions for the level of importance and satisfaction.

1)
Importance level  The assessment was based on the answers of 16 respondents.PIECES framework concludes that the average of all PIECES aspects like performance, information and data, economics, control, security, efficiency, and service is important and satisfactory for the respondents.

H. Data Visualization
There are several visualization experiments conducted.The visualization is done based on the available database in the postgreSQL virtual machine.Some visualization aspects are Livebirths, Marriage and Divorce, and Employment.

I. Comparison
We compare the position and value of this research with other research.Each research used a different tool and had a case that needed visualizing.Most of that research works on population issues, and one research works on environmental issues.This condition shows that spatial data visualization is broadly used in various real aspects.[29] used an open source named Folium to visualize Nigeria's criminal case to a pin map, but the open geoJSON map is unavailable.Sempé et al. [23] used INEI Projection, a visualization system for Peru, and there is no open geoJSON.Atta-ur-Rahman et al. [30] used Tableau, which is not an open source, to visualize the disease in New York.Petrova-Antonova et al. [31] used open-source Kibana to visualize the air quality in Europe.Moreover the last is Arif et al. [20], who used open-source Apache Superset to visualize Indonesia's earthquake data; it also has open geoJSON, but the research does have a region heatmap to cluster and get the value of each data.The comparison shows that this research using Metabase has a good affordability point that makes it easier and more beneficial to implement.

IV. CONCLUSION
Metabase is a useful tool for visualizing data related to job types in a database.The document highlights the various visualization options available through Metabase, including line, bar, combo, area, row, waterfall, scatter, pie, table, and map.With these visualization options, it is easy to analyze and interpret data on the various job types in the database.Overall, the document provides a brief overview of how Metabase works and how it can generate visualizations of jobtype data.It demonstrates the ease with which visualizations can be changed and customized, and it emphasizes the importance of having a powerful tool like Metabase for data analysis and decision-making.Not only about the excellent display, the interface in Metabase is also easy to understand.Based on the comparison of Metabase, this research has a good point in affordability.Based on the PIECES framework, this application's importance level is 4.56 or 91.25%, meaning the application is important, and the satisfaction level is 4.29 or 85.76%, meaning the application is satisfied for the respondents.This research provides a brief overview of how Metabase works and how it can be used to generate visualizations of job-type data.It demonstrates the ease with which visualizations can be changed and customized.It also emphasizes the importance of having a powerful tool like Metabase for data analysis and decision-making, especially for the civil registration agency.

Table VII .
until

TABLE II PIECES
FRAMEWORK PERFORMANCE ASPECT LIST