A Classification Algorithm Inspired by the Chromatographic Separation Mechanism Dedicated to the Classification of Variable-length and Multi-class Vectors
DOI: http://dx.doi.org/10.62527/joiv.8.1.2324
Abstract
Nowadays, one of the critical problems related to data mining is the processing of large data sets. This article presents an algorithm that may apply to the issues associated with classifying large-volume data sets. The motivation behind defining this type of algorithm was that the methods used to process this data type are subject to several significant limitations. The first considerable limitation of using classical classification methods is ensuring a constant data size. The second type of constraint is related to the data dimension. The last limitation in using classic classification algorithms is associated with the situation in which a given input vector may contain data belonging to many classes simultaneously, in which case we are talking about so-called multi-class vectors. The presented algorithm is inspired by the method of chromatographic separation of chemical substances. This method is widely and successfully used in analytical chemistry. As we know, in the case of chromatographic separation, we are dealing with a similar class of problems that occur when processing large data sets, firstly: the molecules of a chemical substance have a different number of molecules - i.e., they have different lengths, which corresponds to the situation that occurs when processing large data sets. In this work, a classification algorithm inspired by the mechanism of resolution chromatography is presented. The article presents the results of calculations for sample data sets. It discusses issues related to the properties of the defined algorithm, which concern the algorithm training process and the classification of single-class and multi-class data.
Keywords
Full Text:
PDFReferences
D. Reinsel, J. Gantz, and J. Rydning, “Data Age 2025: The Evolution of Data to Life-Critical Don’t Focus on Big Data; Focus on the Data That’s Big Sponsored by Seagate The Evolution of Data to Life-Critical Don’t Focus on Big Data; Focus on the Data That’s Big,” 2017, Accessed: Feb. 06, 2024. [Online]. Available: www.idc.com
M. Hilbert, “Big Data for Development: A Review of Promises and Challenges,” Development Policy Review, vol. 34, no. 1, pp. 135–174, Jan. 2016, doi: 10.1111/DPR.12142.
A. Brabazon, M. O’neill, and S. Mcgarraghy, “Natural Computing Series Natural Computing Algorithms”, Accessed: Feb. 06, 2024. [Online]. Available: www.springer.com/series/
S. D. Varhadi, V. A. Gaikwad, R. R. Sali, K. Chambalwar, and V. Kandekar, “A Short Review on: Definition, Principle and Applications of High Performance Liquid Chromatography INTRODUCTION,” vol. 19, no. 2, pp. 628–634, 2020, Accessed: Feb. 06, 2024. [Online]. Available: www.ijppr.humanjournals.com
Q.-H. Wan, Mixed-Mode Chromatography Principles, Methods, and Applications. Springer Singapore, Imprint: Springer, 2021.
K. Robards and D. Ryan, “Principles and Practice of Modern Chromatographic Methods,” Principles and Practice of Modern Chromatographic Methods, pp. 1–518, Jan. 2021, doi: 10.1016/B978-0-12-822096-2.09993-X.
“Encyclopedia of Separation Science | ScienceDirect.” Accessed: Feb. 06, 2024. [Online]. Available: https://www.sciencedirect.com/referencework/9780122267703/encyclopedia-of-separation-science
E. and M. National Academies of Sciences, “A Research Agenda for Transforming Separation Science,” A Research Agenda for Transforming Separation Science, Jun. 2019, doi: 10.17226/25421.
J. Martens, R. Bhushan, M. Sajewicz, and T. Kowalska, “Chromatographic Enantioseparations in Achiral Environments: Myth or Truth? A Prevalent Concept of Chromatographic Enantioseparations and Its Later Modification,” J Chromatogr Sci, vol. 55, no. 7, pp. 748–749, 2017, doi: 10.1093/chromsci/bmx031.
M. Witting and S. Böcker, “Current status of retention time prediction in metabolite identification,” J Sep Sci, vol. 43, no. 9–10, pp. 1746–1754, May 2020, doi: 10.1002/JSSC.202000060.
J. C. Giddings, “Dynamics of chromatography: Principles and theory,” Dynamics of Chromatography: Principles and Theory, pp. 1–323, Jan. 2017, doi: 10.1201/9781315275871/DYNAMICS-CHROMATOGRAPHY-CALVIN-GIDDINGS.
H. Schmidt-Traub, M. Schulte, and A. Seidel-Morgenstern, “Preparative chromatography: Third Edition,” Preparative Chromatography: Third Edition, pp. 1–620, Feb. 2020, doi: 10.1002/9783527816347.
M. K. Gupta and P. K. Biswas, “Chromatography: Basic principle, types, and applications,” Basic Biotechniques for Bioprocess and Bioentrepreneurship, pp. 173–182, Jan. 2023, doi: 10.1016/B978-0-12-816109-8.00010-6.
“Chromatography: Definition, Working, and Importance in Various Industries.” Accessed: Feb. 06, 2024. [Online]. Available: https://www.researchdive.com/blog/what-is-chromatography-how-does-it-work-and-where-is-it-used
“Calculators| Chromatography Equations - MicroSolv Technology Corp MTC-USA.” Accessed: Feb. 06, 2024. [Online]. Available: https://www.mtc-usa.com/calculators
J. Pezzatti et al., “Implementation of liquid chromatography–high resolution mass spectrometry methods for untargeted metabolomic analyses of biological samples: A tutorial,” Anal Chim Acta, vol. 1105, pp. 28–44, Apr. 2020, doi: 10.1016/J.ACA.2019.12.062.
“Introduction to Affinity Chromatography | Bio-Rad.” Accessed: Feb. 06, 2024. [Online]. Available: https://www.bio-rad.com/en-pl/applications-technologies/introduction-affinity-chromatography?ID=LUSMJIDN
J. R. Chapman, “Practical organic mass spectrometry : a guide for chemical and biochemical analysis,” p. 330, 1993, Accessed: Feb. 06, 2024. [Online]. Available: https://www.wiley.com/en-us/Practical+Organic+Mass+Spectrometry%3A+A+Guide+for+Chemical+and+Biochemical+Analysis%2C+2nd+Edition-p-9780471958314
A. Berthod, T. Maryutina, B. Spivakov, O. Shpigun, and I. A. Sutherland, “Countercurrent chromatography in analytical chemistry (IUPAC technical report),” Pure and Applied Chemistry, vol. 81, no. 2, pp. 355–387, Jan. 2009, doi: 10.1351/PAC-REP-08-06-05/PDF.
T. O. Nicolescu, “Interpretation of Mass Spectra,” 2017, doi: 10.5772/intechopen.68595.
A. Knorr et al., “Computer-assisted structure identification (CASI) - An automated platform for high-throughput identification of small molecules by two-dimensional gas chromatography coupled to mass spectrometry,” Anal Chem, vol. 85, no. 23, pp. 11216–11224, Dec. 2013, doi: 10.1021/AC4011952.
V. I. Babushok, “Chromatographic retention indices in identification of chemical compounds,” TrAC Trends in Analytical Chemistry, vol. 69, pp. 98–104, Jun. 2015, doi: 10.1016/J.TRAC.2015.04.001.
O. D. Sparkman, “ Identification of essential oil components by gas chromatography/quadrupole mass spectroscopy Robert P. Adams ,” J Am Soc Mass Spectrom, vol. 16, no. 11, 2005, doi: 10.1016/j.jasms.2005.07.008.
“Affinity Chromatography Principle, Procedure, Application, Advantages & Disadvantages - 2020 - YouTube.” Accessed: Feb. 06, 2024. [Online]. Available: https://www.youtube.com/watch?v=zE0-F5TgpRs
O. Jones, Two-dimensional liquid chromatography principles and practical applications. 2020. Accessed: Feb. 06, 2024. [Online]. Available: https://www.bookshopsantacruz.com/book/9789811561894
“Learning by Simulations: Overlapping Peaks.” Accessed: Feb. 06, 2024. [Online]. Available: https://www.vias.org/simulations/simusoft_peakoverlap.html
L. Mondello, P. Q. Tranchida, P. Dugo, and G. Dugo, “Comprehensive two-dimensional gas chromatography-mass spectrometry: a review,” Mass Spectrom Rev, vol. 27, no. 2, pp. 101–124, Mar. 2008, doi: 10.1002/MAS.20158.
A. Zaid, N. H. Hassan, P. J. Marriott, and Y. F. Wong, “Comprehensive Two-Dimensional Gas Chromatography as a Bioanalytical Platform for Drug Discovery and Analysis.,” Pharmaceutics, vol. 15, no. 4, Mar. 2023, doi: 10.3390/pharmaceutics15041121.
M. Urh, D. Simpson, and K. Zhao, “Affinity chromatography: general methods,” Methods Enzymol, vol. 463, no. C, pp. 417–438, 2009, doi: 10.1016/S0076-6879(09)63026-3.
D. S. Hage, “Affinity Chromatography: A Review of Clinical Applications,” 1999, Accessed: Feb. 06, 2024. [Online]. Available: https://academic.oup.com/clinchem/article/45/5/593/5643177