Multi-Pipeline Sentiment Analysis for West-African Pigin
Abstract
This paper introduces a multi-pipeline approach to Sentiment Analysis, focusing on improving the accuracy and relevance of results. Existing approaches to sentiment analysis of West-African pidgin have been fragmented, often training a new model with pidgin data, and focusing on general sentiment polarity. This works employs a holistic approach, developing a multi-pipeline system for the exhaustive analysis of polarity in pidgin text. A subject classifier is developed with the Logistic Regression algorithm to predict the relevance of a body of text to the subject matter. Data from twitter is collected and processed into tokens for training and evaluation. For sentiment analysis, a cross-lingual model with Roberta (XLM-R) is integrated after being fine-tuned and expanded by means of transfer learning in the AfriBerta model, which achieved an F1-score of 74.5, an average over five (5) runs. The subject classifier achieved an accuracy of 0.81, proving efficient in identifying relevant text to the subject.
References
Pang, B. and Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135.
Feldman, R. and Sanger, J. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data, 1-3. Cambridge: Cambridge University PressOgueji K., Zhu Y., and Lin J. (2021). Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages. In Proceedings of the 1st Workshop on Multilingual Representation Learning, 116–126, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Olagunju, Tolulope, Oyebode, Oladapo and Orji, Rita. (2020). Exploring Key Issues Affecting African Mobile eCommerce Applications Using Sentiment and Thematic Analysis. IEEE Access. 8. 114475-114486. 10.1109/ACCESS.2020.3000093.Berry, M. W. and Kogan, J. (2010). Text mining: Applications and theory, i-xiv. John Wiley and Sons.
Limboi, S. and Diosan, L. (2022). An unsupervised approach for Twitter Sentiment Analysis of USA 2020 Presidential Election 1-6.10.1109/INISTA55318.2022.9894264. Dang, N.C, Moreno-García, M.N. and De la Prieta, F. (2020). Sentiment Analysis Based on Deep Learning: A Comparative Study. Electronics. 2020; 9(3):483
Jiang L., Yu M., Zhou M., and Liu X. (2011). Target-dependent Twitter sentiment classification, 151-160. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.
Muhammad S. H., Adelani D. I., Sebastian R., Ahmad I. S., Abdulmumin I., Bello B. S., Choudhury M., Emezue C. C, Abdullahi S. S, Aremu A., Jorge A. and Brazdil P. (2022). NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, 590–602, Marseille, France. European Language Resources Association.
Devika, M.D., Sunutha, C., and Ganesh A. (2016). Sentiment Analysis: A Comparative Study on Different Approaches.
Alexis C., Kartikay K., Naman G., Vishrav C., Guillaume W., Francisco G., Edouard G., Myle O., Luke Z. and Veselin S. (2020). Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 8440–8451, Online. Association for Computational Linguistics.
Ihemere, Kelechukwu Uchechukwu. (2006). An Integrated Approach to the Study of Language Attitudes and Change in Nigeria: The Case of the Ikwerre of Port Harcourt City. In Olaoba F. Aransanyi & Michael A. Pemberton (eds.), Proceedings of the 36th Annual Conference on African Linguistics, 194–207.
Wolf T., Debut L., Sanh V., Chaumond J., Delangue C., Moi A., Cistac P., Rault T., Louf R., Funtowicz M., Davison J., Shleifer S., Patrick von Platen, Ma C., Jernite Y., Plu J., Xu C., Teven Le Scao and Gugger S. (2020). Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45, Online. Association for Computational Linguistics.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., (2017). Attention is all you need. Advances in neural information processing systems, 30.
Shah, K., Patel, H., Sanghvi, D. and Shah M. (2020). A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for Text Classification. Augment Hum Res 5, 12Cox, D.R. (1958) The Regression Analysis of Binary Sequences. Journal of the Royal Statistical Society: Series B, 20, 215-242.