Efficient processing of GRU based on word embedding for text classification

Muhammad Zulqarnain - Universiti Tun Hussein Onn Malaysia, Johor, Malaysia
Rozaida Ghazali - Universiti Tun Hussein Onn Malaysia, Johor, Malaysia
Muhammad Ghouse - Universiti Tun Hussein Onn Malaysia, Johor, Malaysia
Muhammad Mushtaq - Universiti Tun Hussein Onn Malaysia, Johor, Malaysia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.3.4.289

Abstract


Text classification has become very serious problem for big organization to manage the large amount of online data and has been extensively applied in the tasks of Natural Language Processing (NLP). Text classification can support users to excellently manage and exploit meaningful information require to be classified into various categories for further use. In order to best classify texts, our research efforts to develop a deep learning approach which obtains superior performance in text classification than other RNNs approaches. However, the main problem in text classification is how to enhance the classification accuracy and the sparsity of the data semantics sensitivity to context often hinders the classification performance of texts. In order to overcome the weakness, in this paper we proposed unified structure to investigate the effects of word embedding and Gated Recurrent Unit (GRU) for text classification on two benchmark datasets included (Google snippets and TREC). GRU is a well-known type of recurrent neural network (RNN), which is ability of computing sequential data over its recurrent architecture. Experimentally, the semantically connected words are commonly near to each other in embedding spaces. First, words in posts are changed into vectors via word embedding technique. Then, the words sequential in sentences are fed to GRU to extract the contextual semantics between words. The experimental results showed that proposed GRU model can effectively learn the word usage in context of texts provided training data. The quantity and quality of training data significantly affected the performance. We evaluated the performance of proposed approach with traditional recurrent approaches, RNN, MV-RNN and LSTM, the proposed approach is obtained better results on two benchmark datasets in the term of accuracy and error rate.

Keywords


RNN; GRU; LSTM; Word embedding; Text classification; Natural language processing;

Full Text:

PDF

References


J. Protasiewicz, “A recent overview of the state-of-the-art elements of text classification,†Expert Syst. Appl., vol. 106, pp. 36–54, 2018.

W. Sharif, N. A. Samsudin, M. M. Deris, and M. Aamir, “Improved relative discriminative criterion feature ranking technique for text classification,†Int. J. Artif. Intell., vol. 15, no. 2, pp. 61–78, 2017.

A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez, “A Review on Deep Learning Techniques Applied to Semantic Segmentation,†pp. 1–23, 2017.

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks,†IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1717–1724, 2014.

D. Tang, F. Wei, B. Qin, N. Yang, T. Liu, and M. Zhou, “Sentiment Embeddings with Applications to Sentiment Analysis,†IEEE Trans. Knowl. Data Eng., vol. 28, no. October, pp. 496–509, 2016.

R. Zhao, W. Ouyang, H. Li, and X. Wang, “Saliency detection by multi-context deep learning,†Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07–12–June, pp. 1265–1274, 2015.

O. I. and C. Cardie, “Deep Recursive Neural Networks for Compositionality in Language,†Adv. neural Inf. Process. Syst., pp. 2096–2104, 2014.

A. Dahou, M. A. Elaziz, J. Zhou, and S. Xiong, “Arabic Sentiment Classification Using Convolutional Neural Network and Differential Evolution Algorithm,†Comput. Intell. Neurosci., vol. 2019, pp. 1–16, 2019.

S. Hochreiter, “Long Short Term Memory,†Neural Comput., vol. 9, no. 8, pp. 1–32, 1997.

K. Cho, “On the Properties of Neural Machine Translation: Encoder–Decoder Approaches,†arXiv, vol. 5, pp. 1–9, 2014.

R. Collobert and J. Weston, “A unified architecture for natural language processing,†Proc. 25th Int. Conf. Mach. Learn. - ICML ’08, pp. 160–167, 2008.

R. Socher, A. Perelygin, and J. Wu, “Recursive deep models for semantic compositionality over a sentiment treebank,†Proc. …, no. October, pp. 1631–1642, 2013.

M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher, and H. Daumé III, “A Neural Network for Factoid Question Answering over Paragraphs,†Proc. 2014 Conf. Empir. Methods Nat. Lang. Process., pp. 633–644, 2014.

S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A large annotated corpus for learning natural language inference,†2015.

A. Kumar et al., “Ask Me Anything: Dynamic Memory Networks for Natural Language Processing,†vol. 48, 2015.

M. Ravanelli, P. Brakel, M. Omologo, and Y. Bengio, “Light Gated Recurrent Units for Speech Recognition,†IEEE Trans. Emerg. Top. Comput. Intell., vol. 2, no. 2, pp. 92–102, 2018.

T. Mikolov, J. Kopecky, L. Burget, O. Glembek, and J. Cernocky, “Neural network based language models for highly inflective languages,†Icassp-2009, pp. 4725–4728, 2009.

T. Mikolov, G. Corrado, K. Chen, and J. Dean, “Efficient Estimation ofWord Representations in Vector Space,†arXiv Prepr. arXiv1301.3781, pp. 1–12, 2013.

M. J. Berger, “Large Scale Multi-label Text Classification with Semantic Word Vectors,†Tech. Rep., pp. 1–8, 2014.

P. Liu, X. Qiu, and X. Huang, “Recurrent Neural Network for Text Classification with Multi-Task Learning,†Proc. 25th Int. Jt. Conf. Artif. Intell. IJCAI-16, p. to appear, 2016.

Y. Xiao and K. Cho, “Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers,†arXiv, vol. 1602, no. 00367, 2016.

A. Karpathy, “Deep Visual-Semantic Alignments for Generating Image Descriptions.â€

M. Sundermeyer, H. Ney, and R. Schlüter, “From Feedforward to Recurrent LSTM Neural Networks for Language Modeling,†IEEE/ACM Trans. Audio, Speech, Lang. Process, vol. 23, no. 3, pp. 517–529, 2015.

Z. H. I. Li, F. A. N. Yang, and Y. Luo, “Context Embedding Based on Bi-LSTM in Semi-Supervised Biomedical Word Sense Disambiguation,†IEEE Access, vol. 7, pp. 72928–72935, 2019.

V. Srividhya and R. Anitha, “Evaluating Preprocessing Techniques in Text Categor ization,†Int. J. Comput. Sci. Appl., vol. 47, no. April, pp. 49–51, 2010.

R. Johnson and T. Zhang, “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks,†no. 2011, 2014.

J. Pennington, R. Socher, and C. D. Manning, “GloVe : Global Vectors for Word Representation,†Proc. Conf. Empir. Methods Nat. Lang. Process., no. October, pp. 1532–1543, 2014.

A. Cochez et al., “This is an electronic reprint of the original article . This reprint may differ from the original in pagination and typographic detail . Global RDF Vector Space Embeddings,†Int. Semant. Web Conf., pp. 190–207, 2017.

R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, “Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions,†Proc. Conf. Empir. methods Nat. Lang. Process., no. ii, pp. 151–161, 2011.

X. Phan, “Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large-scale Data Collections,†Proc. 17th Int. Conf. World Wide Web, pp. 91–100, 2008.

D. Roth, “Learning Question Classifiers £,†Proc. 19th Int. Conf. Comput. Linguist., vol. 1, no. August, pp. 1–7, 2002.

H. Lee, “for Modeling Sentences and Documents,†Proc. 15th Annu. Conf. North Am. Chapter Assoc. Comput., no. June, pp. 1512–1521, 2015.

D. P. Kingma and J. L. Ba, “A method for stochastic optimization,†arXiv, no. March, pp. 1–15, 2015.

G. Hinton, “Dropout : A Simple Way to Prevent Neural Networks from Overfitting,†J. Mach. Learn. Res. 2014, 15, vol. 15, pp. 1929–1958, 2014.