The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for example, a recent version of Adobe Acrobat Reader).
If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs.
Alternatively, you can download the PDF file directly to your computer, from where it can be opened using a PDF reader. To download the PDF, click the Download link above.
BibTex Citation Data :
@article{JOIV1259, author = {Gyeongmin Kim and Minseok Kim and Jaechoon Jo}, title = {Enhancing Code Similarity with Augmented Data Filtering and Ensemble Strategies}, journal = {JOIV : International Journal on Informatics Visualization}, volume = {6}, number = {3}, year = {2022}, keywords = {Code similarity; language model; software productivity; CodeBERT; cross-validated ensemble.}, abstract = {Although COVID-19 has severely affected the global economy, information technology (IT) employees managed to perform most of their work from home. Telecommuting and remote work have promoted a demand for IT services in various market sectors, including retail, entertainment, education, and healthcare. Consequently, computer and information experts are also in demand. However, producing IT, experts is difficult during a pandemic owing to limitations, such as the reduced enrollment of international students. Therefore, researching increasing software productivity is essential; this study proposes a code similarity determination model that utilizes augmented data filtering and ensemble strategies. This algorithm is the first automated development system for increasing software productivity that addresses the current situation—a worldwide shortage of software dramatically improves performance in various downstream natural language processing tasks (NLP). Unlike general-purpose pre-trained language models (PLMs), CodeBERT and GraphCodeBERT are PLMs that have learned both natural and programming languages. Hence, they are suitable as code similarity determination models. The data filtering process consists of three steps: (1) deduplication of data, (2) deletion of intersection, and (3) an exhaustive search. The best mating (BM) 25 and length normalization of BM25 (BM25L) algorithms were used to construct positive and negative pairs. The performance of the model was evaluated using the 5-fold cross-validation ensemble technique. Experiments demonstrate the effectiveness of the proposed method quantitatively. Moreover, we expect this method to be optimal for increasing software productivity in various NLP tasks.}, issn = {2549-9904}, pages = {676--680}, doi = {10.30630/joiv.6.3.1259}, url = {https://joiv.org/index.php/joiv/article/view/1259} }
Refworks Citation Data :
@article{{JOIV}{1259}, author = {Kim, G., Kim, M., Jo, J.}, title = {Enhancing Code Similarity with Augmented Data Filtering and Ensemble Strategies}, journal = {JOIV : International Journal on Informatics Visualization}, volume = {6}, number = {3}, year = {2022}, doi = {10.30630/joiv.6.3.1259}, url = {} }Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
__________________________________________________________________________
JOIV : International Journal on Informatics Visualization
ISSN 2549-9610 (print) | 2549-9904 (online)
Organized by Department of Information Technology - Politeknik Negeri Padang, and Institute of Visual Informatics - UKM and Soft Computing and Data Mining Centre - UTHM
W : http://joiv.org
E : joiv@pnp.ac.id, hidra@pnp.ac.id, rahmat@pnp.ac.id
View JOIV Stats
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.