The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).
If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.
Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.
BibTex Citation Data :
@article{JOIV1525, author = {Irfan Darmawan and Muhamad Maulana and Rohmat Gunawan and Nur Widiyasono}, title = {Evaluating Web Scraping Performance Using XPath, CSS Selector, Regular Expression, and HTML DOM With Multiprocessing Technical Applications}, journal = {JOIV : International Journal on Informatics Visualization}, volume = {6}, number = {4}, year = {2022}, keywords = {Multiprocessing; scraping; website; HTML DOM.}, abstract = {Data collection has become a necessity today, especially since many sources of data on the internet can be used for various needs. The main activity in data collection is collecting quality information that can be analyzed and used to support decisions or provide evidence. The process of retrieving data from the internet is also known as web scraping. There are various methods of web scraping that are commonly used. The amount of data scattered on the internet will be quite time-consuming if the web scraping is done on a large scale. By applying the parallel concept, the multi-processing approach can help complete a job. This study aimed to determine the performance of the web scraping method with the application of multi-processing. Testing is done by doing the process of scraping data from a predetermined target web. Four web scraping methods: CSS Selector, HTML DOM, Regex, and XPath, were selected to be used in the experiment measured based on the parameters of CPU usage, memory usage, execution time, and bandwidth usage. Based on experimental data, the Regex method has the least CPU and memory usage compared to other methods. While XPath requires the least time compared to other methods. The CSS Selector method is the smallest in terms of bandwidth usage compared to other methods. The application of multi-processing techniques to each web scraping method is proven to save memory usage, reduce execution time and reduce bandwidth usage compared to only using single processing.}, issn = {2549-9904}, pages = {904--910}, doi = {10.30630/joiv.6.4.1525}, url = {https://joiv.org/index.php/joiv/article/view/1525} }
Refworks Citation Data :
@article{{JOIV}{1525}, author = {Darmawan, I., Maulana, M., Gunawan, R., Widiyasono, N.}, title = {Evaluating Web Scraping Performance Using XPath, CSS Selector, Regular Expression, and HTML DOM With Multiprocessing Technical Applications}, journal = {JOIV : International Journal on Informatics Visualization}, volume = {6}, number = {4}, year = {2022}, doi = {10.30630/joiv.6.4.1525}, url = {} }Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
__________________________________________________________________________
JOIV : International Journal on Informatics Visualization
ISSN 2549-9610 (print) | 2549-9904 (online)
Organized by Department of Information Technology - Politeknik Negeri Padang, and Institute of Visual Informatics - UKM and Soft Computing and Data Mining Centre - UTHM
W : http://joiv.org
E : joiv@pnp.ac.id, hidra@pnp.ac.id, rahmat@pnp.ac.id
View JOIV Stats
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.