Cluster Mapping of Waste Exposure Using DBSCAN Approach: Study of Spatial Patterns and Potential Distribution in Bantul Regency

— High level of plastic waste production is a common issue in various places, including the Special Region of Yogyakarta province. It is proven that the Piyungan Integrated Waste Disposal Site (TPST) or Final Disposal Site (TPA) in Bantul Regency has been closed several times due to capacity exceeding the quota and some blockages from residents around the TPA. Issues related to microplastic contamination resulting from discarded plastic waste are fascinating to study, considering the long-term impact of microplastic contamination on human health. This research aims to map the distribution of locations with the potential for waste accumulation to reduce the negative consequences of microplastic contamination. The population used included TPS and markets in Bantul District, with the sample being the distribution of TPS and market points in Bantul District in 2023. The results of checking the point distribution pattern using the quadrant and nearest neighbor method showed that the distribution of waste accumulation points had a clustered design; thus, it could be continued with cluster analysis using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) method. Meanwhile, the number of clusters was determined iteratively from a combination of DBSCAN parameters (MinPts and Epsilon). The best method was evaluated based on the Silhouette Coefficient (SC) value, which in this case was 0.78 (MinPts 7 and Eps 1500) and included in the strong category. Subsequently, exploration was carried out by reducing the MinPts value and the lowest limit value of strong SC


I. INTRODUCTION
Increasing population and industrialization have led to several environmental problems in recent decades, including plastic and microplastic contamination.This problem has been going on for generations and has always been a topic of social polemic.It is stated to have been going on for generations as it has existed from the past until now and has not even been able to be appropriately resolved.It is stated as a social polemic since it always causes restlessness in everyday life.The Piyungan Final Disposal Site (TPA) or Integrated Waste Disposal Site (TPST) is the largest landfill in the Special Region of Yogyakarta (DIY) province.Located in Ngablak and Watugender Hamlet, Sitimulyo Village, Piyungan District, Bantul Regency, it is the final processing of waste originating from Yogyakarta City, Bantul Regency, and Sleman Regency.Until now, it is still actively functioning, and the waste volume continuously increased yearly [1].The volume of waste and the various problems it causes increase yearly; unfortunately, it does not align with the public's perception of the waste problem.Meanwhile, the waste produced by everyone is not a matter for the community.
Recently, impact and various problems, especially health, can be caused by the waste content itself.Plastic waste, for example, is always present from time to time.Plastic is a material that cannot be naturally decomposed (nonbiodegradable).After being used, fabrics made from plastic will become waste that is difficult to decompose by soil microbes, polluting the environment [2].Environmental pollution by plastic is now widely recognized as a significant burden of environmental pollution [3], especially in aquatic environments where it has prolonged biophysical damage [4][5][6][7], detrimental adverse effects on wildlife [8][9][10][11][12], and limited plastic removal options [13][14][15][16].From a health perspective, the content of plastic waste also impacts human health.Several current issues include plastic waste contamination, which can be a source of hazardous microplastic contamination.Microplastics are tiny plastic particles that are not more than 5 millimeters long and can accumulate in various environments, including sea, air, and soil.Microplastic pollution can harm the environment, human health, and biodiversity [17], [18].Previous research reported that microplastic contamination can bioaccumulate and biomagnify in aquatic food, impacting the environment and human health [19].In addition, the level of microplastic pollution in terrestrial ecosystems is 20 times higher than in the ocean.
Based on these problems above, the initiation of this research focuses on clustering the distribution of waste collection centers (TPS/TPA) to determine the group of these TPAs.Plastic waste mapping becomes critical to anticipate microplastic pollution in the environment.It is carried out to identify locations as the primary sources of microplastic pollution, such as illegal waste dumps or densely populated urban areas.Mapping can also help in planning more effective and efficient waste management.Density-based spatial clustering of applications with noise (DBSCAN), an algorithm where the cluster formation process is carried out based on the density level of distance between objects in the dataset, is used in spatial cluster analysis.It has the advantage of being able to detect noise/things that are not included in any clusters [20].

A. Research Population and Sample
The population of this study refers to a distribution of TPS points and locations that have the potential to have a lot of waste, such as markets in Bantul District.At the same time, the samples used in the research included TPS and markets in Bantul Regency in 2023.The sample above was used because these centers have potential waste piles.Sample data was obtained using Google Earth Pro, Google Earth Engine, and Botsol Google Maps Crawler in 2023.

B. Research Variable
The variables in this research included the location of coordinate points from the distribution of polling stations and markets in the Bantul District.The coordinate system used was the Universal Transverse Mercator (UTM) coordinate projection system.The data used is illustrated in Table 1.

C. Research Stage
The initial stage aimed to search for point distribution data and continued with data input.The next step was data preprocessing.Some of the treatments carried out included eliminating points outside the Bantul Regency and eliminating data redundancies.The third stage was Exploratory Data Analysis (EDA) in which point distribution data and initial visualization of point density were carried out.The next stage was checking the distribution of points and whether the points were spread in groups.The Quadrant and nearest neighbor methods were used to check the spatial distribution of the points.If the points were spread out in groups, then the DBSCAN clustering method could be continued.However, other methods, such as zoning mapping for each sub-district, can be used if it does not spread in groups.The further stage was to calculate the DBSCAN parameters, i.e., MinPts and Epsilon, to determine the number of clusters formed.The MinPts and Epsilon values were obtained from the second iteration and evaluated using the silhouette coefficient according to Equation 4. The sixth stage explored the parameter values and interpretation of the results through visualization.Overall, the research stages are presented in Fig. 1.

D. Toxicity of microplastics
It is reported that microplastic contamination exists in various biophysical environments such as water, sediment, soil, and air.Microplastics have a toxic impact on both the environment and human health.Some of their toxic effects on the environment include: (1) Accumulation in organisms: microplastics can be consumed by marine organisms, such as plankton and fish, and accumulate in the tissues of these marine organisms.This impacts the health of the organism and potentially damages the food chain.(2) Environmental pollution: microplastics that are thrown into the environment can cause water and soil pollution.This can threaten the survival of organisms and disrupt the ecosystem.(3) Increased exposure to toxic chemicals: microplastics can absorb toxic chemicals found in the environment, such as pesticides and heavy metals.Organisms that consume microplastics contaminated with these chemicals can experience exposure to toxic chemicals that impact their health and metabolic systems.Inflammation of organs, cancer, and other metabolic diseases can result from the accumulation of these substances in the body.Lastly, (4) Disturbances to the development and reproduction of organisms: microplastics can affect the growth and reproduction of marine organisms.Several studies in fish showed that exposure to microplastics can disrupt embryo development and reduce fertility [21].However, microplastic contamination in the environment can be toxic to human health through the food chain cycle.Plastic waste mapping, as one of the primary sources of microplastic contamination, is an essential step in reducing the effects of microplastics.

E. Configuration Points in The Space
The basic idea behind point pattern investigations is to compare theoretical assumptions about their behavior with empirical evidence from data.Spatial point pattern methods were developed in biology and ecology as early as the mid-20th century, and they became most popular in economics.Ripley's K is a popular approach for determining point distribution clustering.Some of the studies completed using the Ripley's K approach are Vidanapathirana [22], Self [23], [24], and Tzai-Hung [25].
A Poisson point process (also known as complete spatial randomness [CSR] or complete spatial randomness and independence [CSRI] for multitype point patterns) is a classic example of a point pattern that reveals random behavior, stationarity, and isotropy (which means it retains statistical properties under any shifts or rotations.Such patterns are uncommon, but they serve as a useful benchmark, particularly when early analysis does not indicate CSR as a starting point.Point pattern analysis is beneficial for determining whether an underlying data generation process is random or non-random and for investigating probable links between points of one or more categories.Verifying a CSR/CSRI hypothesis can result in either non-rejection, in which case the studied pattern is Poisson, or rejection, in which case one of three possibilities is possible.Fig. 2 depicts all of them.Fig. 2 Algorithm of univariate pattern analysis [26] In approaching the point distribution pattern, the quadrant method, distance-based CRS, and Nearest-neighbor test function were utilized, as shown in Fig. 2. Nearest-neighbor analysis is a method for measuring patterns in two or three dimensions in a data point.The average point distance between all points and their nearest neighbors was calculated using this approach [27].The nearest-neighbor index was calculated by dividing the observed distance by the expected distance.The distance utilized in Equation ( 1) is the Euclidean distance.R represents the nearest-neighbor index, and the calculation is described in Equation (2) [28].
is the observed mean closest-neighbor distance or the average distance between each point and the nearest neighbor.̅ is the predicted mean nearest-neighbor distance or the expected average distance for the provided random points.Equation (3) shows the calculation of ̅ and ̅ .
The distance between i and its nearest neighbor is denoted by , the number of points is denoted by , and A is the minimal rectangular area around all points, or a set area value.c is the test statistics, and *+ , -is the standard error of the mean nearest neighbor, using the calculation algorithm specified in Equations ( 4) and ( 5). .
The nearest-neighbor index ranges from 0 to 2.15.A nearest-neighbor index of 0 implies that the pattern is completely clustered.While, the nearest-neighbor index (2.15)exhibits a fully dispersed pattern.The nearest-neighbor index value of 1 indicates a random pattern.The point configuration is stated to be categorized as a cluster if the calculated 7 / 012 < −1.96 (significance level : =5%) [29].In addition to the quadrant approach, the quadrant method is utilized to analyze the point distribution of the spatial structure.The quadrant approach begins by dividing the area into ; parts of roughly equal size.Then, suppose , count the total number of occurrences in that region.The next step is to compute the average number of events per cell ( ), the variance of the number of events per cell ( < ), and the Variance-Mean Ratio (VMR), as shown in Equation 6.
; : the number of cells per grid, : the total number of occurrences, and : the number of events in cell .The VMR value is the variance-to-mean ratio.If the value of VMR = 0, the point arrangement in space is uniform or perfectly regular.
If VMR is equal to one, the organization of points in space is random.Similarly, a value of VMR 1 (the variance value less than the average value) implies that the point arrangement in space is more inclined to the regular shape.Meanwhile, VMR>1 indicates that locations in space are more clustered than random.If the number of quadrants is less than 30, then the (; − 1) VMR is distributed according to the Chi-Square distribution with degrees of freedom ; − 1 .Equation 7displays the Chi-Square test statistics.
The null hypothesis (G H ) is that the arrangement of points in space is random with a statistical calculation of ; − 1 => .If the value F > F J 02 , reject G H .Meanwhile, if ; is more than 30, ; − 1 => spreads according to the normal distribution (; − 1,2 ; − 1 ) [30].If the distribution pattern of these points creates a cluster, the DBSCAN Algorithm is used to continue spatial clustering.The DBSCAN technique can detect outliers/noise without requiring any inputs in the form of the number of clusters (k) like the K-Means or K-Medoid methods do.This approach also has the advantage of being able to detect any problematic/irregular clusters [30 -31]

F. Density-based Spatial Clustering of Applications with
Noise (DBSCAN) Density-based spatial clustering of applications with noise (DBSCAN) requires two parameters: minimum points (MinPts) and epsilon (Eps), the values of which are specified by the researcher.MinPts is the minimal number of items in a cluster, and Eps is the value for the distance (radius) between items that serves as the foundation for establishing an item point's neighborhood.Neighborhoods inside the radius (∈) of the data item are referred to as -neighborhoods [33].An item is considered a core object if its ∈-neighborhood includes at least as many objects as MinPts.Core objects are density pillars.In a DBSCAN clustering, there are two sorts of points: core points and border points with the neighborhood of border points containing much fewer items compared to the neighborhood of core points [34].A boundary point may be associated with more than one cluster.Given a collection of objects D and the parameters and MinPts, it is feasible to identify all core objects.It reduces clustering by leveraging core objects and neighbors to build a density, where density areas are clusters [35].
DBSCAN searches for clusters by examining the ∈neighborhood (Eps-neighborhood) of each point in the database.If the ∈ -neighborhood of point ' contains more than MinPts, a new cluster is generated with p as the core object.Then, DBSCAN repeatedly gathers density-reachable items directly from the core object which may include merging numerous density-reachable clusters.The DBSCAN algorithm works as follows: (1) randomly choosing an initial point ', (2) determining Eps and MinPts to retrieve all points that are density reachable to point ', (3) if ' is a core point, a cluster is formed, (4) if ' is a border point, there is no densityreachable relationship from p and DBSCAN will visit the next point from the database, (5) continuing the process until all points have been processed.The acquired findings are the independent of the order of the retrieved point process [33].
Iteratively, the choice of MinPts and Eps is made.The best Eps determination is made by measuring the distance around the elbow.DBSCAN clustering is performed once obtaining an ideal epsilon value, such that clusters from each site are created based on their density, meet MinPts and Eps, and then displayed.
The first phase involves calculating the inter-TPS distances once spatial data comprising geospatial information about the TPS has been acquired.Next, the identification of both parameters (Minpts and epsilon value) in the DBSCAN technique makes it easier to use the obtained results to identify TPS clusters or groups based on geographic density.For instance, let us examine a situation in Kabupaten Bantul where TPS are dispersed over the urban area, but at varying densities.In this case, DBSCAN is used to identify denseraggregated TPS clusters that are closer to metropolitan centers or areas with denser populations.On the other hand, TPS located in rural areas with lower population density might be considered noise or irrelevant to major cluster forms.

G. Evaluation of Cluster Model
The average silhouette width is the cluster's evaluation stage.Equation 8 depicts the Silhouette method's estimate [35 -37].Assume that for data point ∈ L (data point in the cluster L ).
The silhouette (value) at point in a cluster is denoted by < .M is the average distance between the JX point and all data not in the same cluster as the JX point.N is the average distance between the JX point and all data in a cluster.|L | denotes the number of points in cluster .The distance between data objects and in the cluster L is given by , .After calculating the average silhouette width value, the average silhouette width is stated to be strong if it is higher than 0.71 [39].

A. Exploratory Data Analysis
The first step was to prepare the data.It started with data profiling, which was conducted by defining and transforming the data used (in the context of this research, the distribution of points with the potential for plastic waste accumulation).Furthermore, data preprocessing was carried out by eliminating the distribution of points not included in the points under study, in this case, the Bantul Regency, and elimination if data were duplicated.The results of the first stage were then visualized in the density map as presented in Fig. 3.As shown in Fig. 4, the black dots refer to the distribution of the observed objects (markets and landfills) in the Bantul Regency.However, the area of each point was not included in this study due to limited data obtained.From the distribution of these points, density gradation was carried out with five different levels.Green color shows the lowest density of points (also stated to have the least potential in microplastic contamination), while red areas are the most densely distributed areas (as seen in the legend ranging from 20 to 25 points).When observed further, green areas are those located on the edges of Bantul Regency, while in the center of the regency are the areas that fall into the dense category.At first glance, the distribution of points had a group pattern, located around the center of Bantul Regency.However, it still needed to be confirmed by using statistical tests to confirm it by using the quadrant method and nearest neighbor analysis such as Equation ( 5) dan (7).

B. Spatial Point Pattern
The location variable was used to determine the point distribution pattern.The projection coordinate system was used in this calculation.If the coordinates used were still in the geographic coordinate system, conversion could be done first.Some software can do this, such as Quantum Geographic Coordinate System (QGIS) [40]or ArcGIS [41].This research used the Universal Transverse Mercator in zone 49 S (because the Bantul regency is in that zone).In the analysis stage of the quadrant method, based on the equation, simulations were carried out by dividing the area with m = 3 (3x3) by m = 20 (20x20).Table 2 presents the results of the quadrant method.The table shows that the calculated chi-square value was higher than the chi-square table value for a large m between 3 and 5.For a matter of m more than 5, the computed z value was also higher when compared to the z table value using the calculation in Equation 2. Based on the above review, it can be concluded that the research object points had a cluster pattern.In addition, it is strengthened by the simulation of the VMR value in Fig 4 .The VMR value with the calculation in Equation 1 was consistently worth more than one, even though the simulation was carried out by increasing the number of quadrants (reducing the size of each grid).Based on this, the points could be declared to spread in groups when approached by the quadrant method.
Furthermore, a nearest-neighbor approach was also carried out to determine the point distribution pattern.Initiated from the nearest-neighbor index (R) value using Equation 1 of 0.04 (close to the value of 0) or tended to lead to a clustered distribution pattern.Based on Equation 1, the calculated z value was -26.95 < -1.96, so the points spread in groups.Based on these two methods, the point distribution pattern in this study tended to be clustered so that it could be continued in the DBSCAN clustering analysis.

C. DBSCAN Clustering and Parameter Exploration
The DBSCAN method has two primary parameters: MinPts and Eps.Before determining the two best parameters, the k-Nearest Neighbor Distance (kNNdist) is determined, which is described as a rapid computation of the k-nearest neighbor distance in the points matrix.The findings of kNNdist are utilized to select the suitable eps and MinPts areas.The kNNdist plot will eventually form a knee that can be used to anticipate the ideal MinPts and Eps values.An illustration of the elbow in this algorithm is presented in Fig. 5.There is a blue dotted line, which is the iteration limit used.Iteration was performed in the area whose MinPts lay on the elbow 3 to 7, and the Eps value was between 1500 and 2500 (in increments of 10 meters).Based on the silhouette coefficient, the optimal cluster was chosen by combining these two characteristics.Table 3 displays the results of the combination of the two parameters.
The maximum silhouette coefficient value was 0.782, namely at MinPts=7 and Eps=1500 with 3 clusters formed, 81 noises, and the number of points in the cluster = 33 (shown at label "a" in Fig. 6).A silhouette coefficient value of 0.782 meant that the cluster had a strong structure [38].The average silhouette values of the simulated MinPts and Eps values are presented in Fig. 6.Based on the figure, mainly after the Eps value of 1700, the Average Silhoutte value tended to decrease compared to the previous value.This meant that although the radius was vast, the strength of the cluster formed tended to weaken.In addition to clustering based on the highest silhouette value, data exploration was carried out with the aim of disaster mitigation (especially garbage pollution).Disaster mitigation is a study purposely to minimize the impact of losses due to disaster events, both material losses and moral losses [42].Exploration was conducted by reducing the MinPts value but still taking the most optimum Eps.Fig. 6 presents labels "b" and "c".Furthermore, exploration was conducted using the Average Silhoutte value, which approached the smallest threshold of the Average Silhoutte value, which was strong (0.71).Fig. 5 The iteration limit for Eps parameter

D. Clustering by Lowering the MinPts Value
As shown in Fig. 6, with the criteria of average silhouette value closest to strong, two possible parameter combinations were obtained, namely 1. MinPts = 5 and Eps = 1500 with an Average Silhouette value of 0.641 (given in label "c").Then the second exploration was MinPts = 6 and Eps = 1500 with Average Silhouette value = 0.678 (presented in label "b").From the first exploration results, the number of clusters was obtained to be 4 groups with a noise count of 72 (while the points in the cluster were 42).This result increased the number of clusters compared to the initial clustering and decreased the noise.Likewise, the results obtained in the second exploration showed 5 clusters with 61 noises and 53 points in the cluster.The illustrations of the results of the first and second explorations are presented in Fig. 7.

E. Clusterization with a Value Close to the Minimum Average Ailhouette Limit
The MinPts and Eps values that met the Average Silhouette closest to the minor threshold were identified as vital in the third exploration.From the simulation results, a MinPts value 7 was obtained with an Eps length of 1760 m.The Average Silhouette value was 0.778.In this third exploration, there were 3 clusters and 78 noise locations.However, in this third exploration, the number of points in the noise was more significant than the number of points in the cluster (the number of points in the noise was 78, while the number of points in the cluster was 36 locations).
Generally, the spatial distribution of Waste Collection Points (TPS) within Bantul Regency exhibits a pronounced clustering pattern, mainly concentrated in urban cores.This spatial phenomenon is attributed to the accessibility factor influencing community waste disposal practices, indicating a notable association between TPS locations and the public service demands pertinent to waste management.This observation assumes significance within the backdrop of escalating urbanization trends in Bantul Regency, where providing sanitation services and waste management infrastructure is paramount for urban dwellers.Nevertheless, the aggregation of TPS within urban settings also engenders adverse repercussions, including the potential for environmental degradation stemming from waste accumulation and the heightened risk of disease transmission, underscoring the imperative for meticulous consideration within sustainable urban development frameworks.
Conversely, spatial analysis elucidates that specific locale within the southeastern and southern territories of Bantul Regency, notably encompassing the districts of Dlingo, Pundong, and Kretek, exhibit a deficiency in TPS infrastructure.These areas, characterized by diminished population densities and distinct geographical features, present themselves as prospective candidates for future centralized TPS deployments, subject to the caveat of pertinent environmental impact mitigation measures.Furthermore, their strategic geographical positioning, such as proximity to elevated terrain or coastal regions, presents opportunities to adopt innovative waste management strategies, including waste-to-energy conversion technologies or alternative waste disposal methodologies.However, it is imperative to acknowledge that determining TPS locations constitutes merely the nascent phase within a holistic waste management endeavor.Effective and sustainable waste management necessitates comprehensive planning, encompassing the integration of processing facilities and recycling initiatives.
Consequently, development strategies amalgamating judicious TPS siting with efficacious waste management frameworks emerge as indispensable requisites in pursuing sustainable, eco-centric urban development paradigms.Hence, the fortification of collaborative partnerships among governmental bodies, academic institutions, and local stakeholders assumes heightened significance in devising tailored solutions to surmount waste management challenges in Bantul Regency, thereby advancing the broader agenda of sustainable development objectives.Based on the results of point density exploration, most of the potential waste centers were concentrated around the central government of the Bantul District.Then, based on the quadrant and nearest neighbor methods, the results showed that the distribution pattern of potential waste contamination areas was clustered.A simulation was conducted using a combination of MinPts with an interval of 3 to 7 and Eps from 1500 to 2500 based on the area around the elbow.From the simulation results, the DBSCAN clustering results obtained an optimum average silhouette value of 0.783, which is in the strong category.These results showed that the number of clusters was 3 with parameter values MinPts = 7 and Eps = 1500.Then, to mitigate the disaster population of waste pollution, exploration was carried out with 3 scenarios: decreasing the MinPts value (5 and 6), and DBSCAN using the value Eps and MinPts.From the results of this exploration, it was obtained that the group was formed and remained in the center of government.

Fig. 6
Fig. 6 Average Silhouette Width of MinPts and Eps simulations

TABLE III SILHOUETTE
COEFFICIENT EVALUATION RESULTS