Phoneme-Based English-Yorùbá Machine Transliteration

Franklin Asahiah; Victor Isebaga Akioyamen

Authors

Franklin Asahiah Department of Computer Science and Engineering Obafemi Awolowo University, Ile-Ife, Nigeria
Victor Isebaga Akioyamen Obafemi Awolowo University, Ile-Ife

Keywords:

source language, target language, orthography, pronouncing, phone.

Abstract

One of the challenges for translating English to Yorùbá in Nigeria is the foreign names and technical terms in news articles and scientific documents. Much of names and terms contain letters not used in Yorùbá language. We present a rule-based model for the transliteration of English noun words to Yorùbá such that the output respects the morphology and phonology of the target language. The model which is phoneme-based, relied upon the CMU pronouncing dictionary to get the phoneme for each word. At the implementation of the model, test carried out on standardized words of 55 words yielded an accuracy of 72.7%, recall of 0.98, precision of 0.965 and F score of 0.972. A second test of non-standardized set references has accuracy of 40.7%, recall of 0.91, precision of 0.925 and F score of 0.912. A few challenges identified with the model include inability to correctly render some of the vowels as required by the phonology of the target language.

References

References
Adedun, E., & Shodipe, M. (2011). Yoruba-English Bilingualism in Central Lagos–Nigeria. Journal of African Cultural Studies, 23(2), 121-132.
Adegbija, E. (1989). Lexico-semantic variation in Nigerian English. World Englishes, 8(2), 165-177.
Adegbite, W., Udofot, I., & Ayoola, K. A. (2014). A Dictionary of Nigerian English. Obafemi Awolowo University Press.
Ahmadi, S. (2019). A rule-based Kurdish text transliteration system. ACM Transactions on Asian and Low- Resource Language Information Processing (TALLIP), 18(2), 1-8.
Ali, A. R., & Ijaz, M. (2009). English to Urdu transliteration system. Proceedings of Language and Technology, 15-23.
Balakrishna, S. V., & Venkatesan, S. M. (2013, May). On the Utility of A syllable-Like Segmentation for Learning A Transliteration from English to An Indic Language. In CS & IT Conference Proceedings (Vol. 3, No. 5). CS & IT Conference Proceedings.
Bamgboşe, A. (1992). Standard Nigerian English: issues of identification. The other tongue: English across cultures, 148-161.
Chen, N., Banchs, R. E., Zhang, M., Duan, X., & Li, H. (2018, July). Report of news 2018 named entity transliteration shared task. In Proceedings of the seventh named entities workshop (pp. 55-73).
CMU Pronouncing Dictionary. (n.d.). (Version 0.7b). Carnegie Mellon University. Retrieved April 22, 2024, from CMUdict
Deep, K., & Goyal, V. (2011). Development of a Punjabi to English transliteration system. International Journal of Computer Science and Communication, 2(2), 521-526.
Finch, A., & Sumita, E. (2009, August). Transliteration by bidirectional statistical machine translation.
In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009) (pp. 52- 56).
Grundkiewicz, R., & Heafield, K. (2018, July). Neural machine translation techniques for named entity transliteration. In Proceedings of the Seventh Named Entities Workshop (pp. 89-94).
Gut, U. (2004). Nigerian English: Phonology. A handbook of varieties of English, 1, 992-1002. Harrington, J. & Cox, F. (2009). The syllable and phonotactic constraints.
http://clas.mq.edu.au/speech/phonetics/phonology/syllable/syll_phonotactic.html. Accessed 7 Sep. 2017.
Hermjakob, U., Knight, K., & Daume´ III, H. (2008, June). Name translation in statistical machine translation- learning when to transliterate. In Proceedings of ACL-08: HLT (pp. 389-397).
Jiang, X., Sun, L., & Zhang, D. (2009, August). A syllable-based name transliteration system. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009) (pp. 96-99).
Jowitt, D. (2018). Nigerian English. De Gruyter Mouton.
Karimi, S., Scholer, F., & Turpin, A. (2011). Machine transliteration survey. ACM Computing Surveys (CSUR), 43(3), 1-46.
Kasparaitis, P. (2023). Automatic Transliteration of Polish and English Proper Nouns into Lithuanian. Information Technology and Control, 52(1), 128-139. https://doi.org/10.5755/j01.itc.52.1.32353
Kenstowicz, M. (2006). Tone loans: the adaptation of English loanwords into Yoruba. In Selected proceedings of the 35th annual conference on African Linguistics (Vol. 13). Somerville, MA: Cascadilla Proceedings Project.
Kirschenbaum, A., & Wintner, S. (2009, March). Lightly Supervised Transliteration for Machine Translation. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009) (pp. 433-441).
Knight, K. (2009, August). Automata for Transliteration and Machine Translation. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009).
Komolafe, O. E. (2014). Borrowing devices in Yoruba terminography. International Journal of Humanities and Social Sciences, 4(8):48-55.
Kundu, S., Paul, S., & Pal, S. (2018, July). A deep learning-based approach to transliteration. In Proceedings of the seventh named entities workshop (pp. 79-83).
Le, N. T., & Sadat, F. (2018, July). Low-resource machine transliteration using recurrent neural networks of Asian languages. In Proceedings of the Seventh Named Entities Workshop (pp. 95-100).
Li, H., Kumaran, A., Pervouchine, V., & Zhang, M. (2009, August). Report of NEWS 2009 machine
transliteration shared task. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009) (pp. 1-18).
Najafi, S., Hauer, B., Riyadh, R. R., Yu, L., & Kondrak, G. (2018, July). Comparison of assorted models for transliteration. In Proceedings of the Seventh Named Entities Workshop (pp. 84-88).
Noeman, S., & Madkour, A. (2010, July). Language independent transliteration mining system using finite state automata framework. In Proceedings of the 2010 Named Entities Workshop (pp. 57-61).
Oh, J. H., & Choi, K. S. (2002). An English-Korean transliteration model using pronunciation and contextual rules. In COLING 2002: The 19th International Conference on Computational Linguistics.
Oh, J. H., Choi, K. S., & Isahara, H. (2006). A machine transliteration model based on correspondence between graphemes and phonemes. ACM Transactions on Asian Language Information Processing (TALIP), 5(3), 185- 208.
Oh, J. H., & Isahara, H. (2007). Machine transliteration using multiple transliteration engines and hypothesis re-ranking. In Proceedings of Machine Translation Summit XI: Papers.
Pingali, P., Ganesh, S., Yella, S., & Varma, V. (2008). Statistical transliteration for cross language information retrieval using HMM alignment model and CRF. In Proceedings of the 2nd workshop on cross lingual
information access (CLIA) addressing the information need of multilingual societies.
Rama, T., & Gali, K. (2009, August). Modeling machine transliteration as a phrase based statistical machine translation problem. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009) (pp. 124-127).
Singhania, S., Nguyen, M., Ngo, G. H., & Chen, N. (2018, July). Statistical machine transliteration baselines for news 2018. In Proceedings of the Seventh Named Entities Workshop (pp. 74-78).
Sunday, A. B., & Oyemade, O. O. (2021). Features of tone in Nigerian English stress pattern. Covenant Journal of Language Studies.
TIJANI, M. (2015). A Morphological Analysis of Loan Words among Yoruba Speakers of English Language in Kaduna Metropolis (Doctoral dissertation, Department of English and Literary Studies, Faculty of Arts, Ahmadu Bello University, Zaria).
Ufomata, T. (1991). Englishization of Yoruba phonology. World Englishes, 10(1), 33-51.
Wei, P., & Bo, X. (2008, July). Chinese-English transliteration using weighted finite-state transducers. In 2008 International Conference on Audio, Language and Image Processing (pp. 1328-1333). IEEE.
Weide, R. L. (1993, September 16). The CMU Pronouncing Dictionary 1(http://www.speech.cs.cmu.edu/cgi- bin/cmudict)
Wutiwiwatchai, C., & Thangthai, A. (2010, July). Syllable-based Thai-English machine transliteration. In Proceedings of the 2010 Named Entities Workshop (pp. 66-70).
Yadav, M., Kumar, I., & Kumar, A. (2023, March). Different Models of Transliteration-A Comprehensive Review. In 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA) (pp. 356-363). IEEE.
Yang, D., Dixon, P., Pan, Y. C., Oonishi, T., Nakamura, M., & Furui, S. (2009, August). Combining a two-step conditional random field model and a joint source channel model for machine transliteration. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009) (pp. 72-75).
Zhang, C., Li, T., & Zhao, T. (2012, July). Syllable-based machine transliteration with extra phrase features. In Proceedings of the 4th Named Entity Workshop (NEWS) 2012 (pp. 52-56).
Zhang, M., Li, H., Kumaran, A., & Liu, M. (2011, November). Report of news 2011 machine transliteration shared task. In Proceedings of the 3rd Named Entities Workshop (NEWS 2011) (pp. 1-13).

Phoneme-Based English-Yorùbá Machine Transliteration

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)

Information

Make a Submission