Named Entity Recognition: Extended Feature Analysis for Improved Recognition for Yorùbá

Authors

  • Franklin Oladiipo Asahiah Department of Computer Science and Engineering Obafemi Awolowo University, Ile-Ife, Nigeria
  • Abayomi E. Adegunlehin University of Texas Health Science Center

Keywords:

Named Entity, Information Extraction, Conditional Random Fields, Features, Context, POS

Abstract

Named Entity Recognition (NER) is an information extraction task which involves a two-step process - identifying named entities in a text and arranging or classifying them into a predefined class. Several NER systems exist for different languages while different optimal set of features have been identified to improve the system’s accuracy. However, not much attention have been given to investigating how each feature impacts the identification and classification process. In this work, we seek to identify and explore the releveance of each feature that can be used in performing NER for Yorùbá texts. Parts of Yorùbá language emanated from the Arabic language, therefore establishing similarities in the possible set of features for Yorùbá NER. These features are selected and a comparison analysis was performed to show the impact of these features on the two-step process - identification and classification of Named Entities.

References

Abdul-Hamid, A. and Darwish K., (2010). Simplified Feature Set for Arabic Named Entity Recognition. In Proceedings of the 2010 Named Entities Workshop (NEWS 2010), pages 110–115, Stroudsburg, PA.
Adegunlehin A. E., Asahiah F. O. and Onifade M. T. (2019) Investigation of Feature Characteristics for Yorùbá Named Entity Recognition System. In Proceedings of Application of Information and Communication Technologies to Teaching, Research, and Administration 2019: 108-111, (Nigeria)
Adeyemi, K. (2016) A Study of Relationship Between Arabic and Yorùbá Languages. Open Journal of Modern Linguistics, 6, 219-224. doi: 10.4236/ojml.2016.63023.
Amarappa S. and Sathyanarayana S. V., (2015). "Kannada Named Entity Recognition and Classification using conditional Random Fields," 2015 International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), Mandya, India, 42 pp. 186-191, doi: 10.1109/ERECT.2015.7499010.
Amarappa S., and Sathyanarayana, S. V. (2017). Kannada Named Entity Recognition and Classification using Support Vector Machine. Transactions on Machine Learning and Artificial Intelligence, 5(1), 43. https://doi.org/10.14738/tmlai.51.2549
Asahiah, F. O., Odejobi, O. A., & Adagunodo, E. R. (2017). Restoring Tone-Marks in Standard Yorùbá Electronic Text: Improved Model. Computer Science, 18(3). pp. 301–315 https://doi.org/10.7494/csci.2017.18.3.2128
Barber, K. (2015). Yorùbá Language and Literature". In Oxford Bibliographies Online in African Studies. Retrieved July 12, 2019 from https://www.oxfordbibliographies.com/view/document/obo-9780199846733/obo-9780199846733-0156.xml
Benajiba, Yassine and Paolo Rosso. (2007). ANERsys 2.0: Conquering the NER task for the Arabic language by combining the maximum entropy with POS-tag information. In Proceedings of Workshop on Natural Language-Independent Engineering, 3rd Indian International Conference on Artificial Intelligence (IICAI-2007), pages 1,814–1,823, Mumbay.
Benajiba, Yassine, Paolo Rosso, and Jos´e Miguel Bened´ı Ruiz. (2007). ANERsys: An Arabic named entity recognition system based on maximum entropy. In Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2007), pages 143–153, Berlin.
Benajiba, Yassine and Rosso P. (2008). Arabic named entity recognition using conditional random fields. In Proceedings of the Workshop on HLT & NLP within the Sixth International Conference on Language Resources and Evaluation (LREC 2008), pages 143–153, Marrakech.
Benajiba, Yassine, Diab M., and Rosso P. (2008b). Arabic named entity recognition using optimized feature sets. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pages 284–293, Stroudsburg, PA.
Benajiba, Y., Diab, M., and Rosso, P. (2009). Arabic Named Entity Recognition: A Feature-Driven Study, IEEE Transactions on Audio, Speech, and Language Processing, 17(5), pp 926-934.
Eberhard, David M., Gary F. Simons, and Charles D. Fennig (eds.). (2021). Ethnologue: Languages of the World. Twenty-fourth edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com.
Ekbal, A., Bandyopadhyay, S. (2009). A Conditional Random Field approach for named entity recognition in Bengali and Hindi, Germany: Department of Computational Linguistics, University of Heidelberg, India: Department of Computer Science and Engineering Jadavpur University.
Grishman, Beth Sundheim. (1996). Message Understanding Conference-6: “A Brief History”. In the proceedings of the 16th International Conference on Computational Linguistics (COLING), pages 466-471, Center for Sprogteknologi, Copenhagen, Denmark
Ikechukwu I, Adebayo O, and Bosede A. (2019). A First Step Towards the Development of Yorùbá Named Entity Recognition System. International Journal of Computer Applications. 182. 1-4.
Kashif Riaz. (2010). “Rule-based Named Entity Recognition in Urdu”. Proceedings of the 2010 Named Entities Workshop, ACL 2010, pages 126 – 135, Uppsala, Sweden.
Kaur Kamaldeep; Vishal Gupta. 2012 “Name Entity Recognition for Punjabi Language”. International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 Vol. 2, No.3.
Kumar, P. P. and Kiran, V. R. (2008), A Hybrid Named Entity Recognition System for South Asian Languages, In Proceedings of the IJCNLP-08 workshop on NER for South and Sound East Asian Languages, pp. 83-88.
Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML 2001 Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289
Le, Huong and Tran, Luan. (2013). Automatic feature selection for named entity recognition using genetic algorithm. ACM International Conference Proceeding Series. 81-87. 10.1145/2542050.2542056.
Mo H. M., Nwet K. T., Soe K. M. (2017). CRF-Based Named Entity Recognition for Myanmar Language. In: Pan JS., Lin JW., Wang CH., Jiang X. (eds) Genetic and Evolutionary Computing. ICGEC 2016. Advances in Intelligent Systems and Computing, vol 536. pp 204-211 Springer, Cham. https://doi.org/10.1007/978-3-319-48490-7_24.
Mollá, D., Van Zaanen, M. and Smith, D. (2006). Named Entity Recognition for Question Answering: In Proceedings of the Australasian Language Technology Workshop (ALTW2006), pp. 51–58.
Nadeau, D. and Sekine, S. (2007). A Survey of Named Entity Recognition and Classification. Lingvisticae Investigationes, 30(1): 3-26.
Nobata, C. Sekine, S. and Isahara, H. (2003). Evaluation of Features for Sentence Extraction on Different Types of Corpora. Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering. 12 pp 292-36.
Oyewusi, W.F., Adekanmbi, O., Okoh, I., Onuigwe, V., Salami, M.I., Osakuade, O., Ibejih, S., and Musa, U.A. (2021). NaijaNER : Comprehensive Named Entity Recognition for 5 Nigerian Languages. ArXiv, abs/2105.00810.
Rodrigo, Á, Pérez-Iglesias, J., Peñas, A., Garrido, G. and Araujo, L. (2013). Answering Questions About European Legislation. Expert Systems with Applications, 40(15): 5811-5816
Sha, F. and Pereira, F. (2003). Shallow Parsing with Conditional Random Fields. In Conference on Human Language Technology and North American Association for Computational Linguistics (HLT-NAACL), pp. 213–220.
Srikantha P. and Murthy K. N. (2008), Named Entity Recognition for Telugu. In Proceedings of IJCNLP-08 workshop on NER for South and Sound East Asian Languages. pp. 41-50.
Tkachenko, M., and Simanovsky, A. (2012). Named Entity Recognition: Exploring Features. In Proceedings of the 11th Conference on Natural Language Processing (KONVENS 2012), Vienna, Austria, pp 118-127.
Toda, H. and Kataoka, R. (2005). A search result clustering method using informatively named entities. Proceedings of the Seventh Annual ACM International Workshop on Web Information and Data Management, ACM. pp. 81-86.
Vijayakrishna, R., Sobha, L., (2008). Domain focused named entity recognizer for Tamil using conditional random fields. In: Proceeding of the IJCNLP-08 Workshop on NER for South East Asian Languages.
Wallach, H. M. (2004). Conditional random fields: an introduction. University of Pennsylvania CIS Technical Report MS-CIS-04-21, 24.

Downloads

Published

2024-11-30

How to Cite

Asahiah, F., & Adegunlehin, A. (2024). Named Entity Recognition: Extended Feature Analysis for Improved Recognition for Yorùbá. Ife Journal of Technology, 29(2), 20–25. Retrieved from https://ijt.oauife.edu.ng/index.php/ijt/article/view/267

Issue

Section

III. Electrical and Computing Technologies