Language Modelling for a Low-Resource Language in Sarawak, Malaysia

Sarah Flora, Samson Juan and Muhamad Fikr, Bin Che Ismail and Hamimah, Binti Ujir and Irwandi Hipni, Bin Mohamad Hipiny (2019) Language Modelling for a Low-Resource Language in Sarawak, Malaysia. In: Advances in Electronics Engineering. Lecture Notes in Electrical Engineering book series (LNEE, volume 619), 619 . Springer, Singapore, pp. 147-158. ISBN 978-981-15-1288-9

[img] PDF
Language Modelling for a Low-Resource Language in Sarawak, Malaysia.pdf

Download (325kB)
Official URL: https://link.springer.com/chapter/10.1007/978-981-...

Abstract

This paper explores state-of-the-art techniques for creating language models in low-resource setting. It is known that building a good statistical language model requires a large amount of data. Therefore, models that are trained on low-resource language suffer from poor performances. We conducted a study on current language modelling techniques such as n-gram and recurrent neural network (RNN) to observe their outcomes on data from a language in Sarawak, Malaysia. The target language is Iban, a widely spoken language in this region. We have collected news data form an online source to build an Iban text corpus. After normalising the data, we trained trigram and RNN language models and tested on automatic speech recognition data. Based on our results, we observed that the RNN language models did not significantly outperform the trigram language models. A slight improvement on RNN model is seen after the size of the training data was increased. We have also experimented on merging n-gram and RNN language models and we obtained 32.33% improvement after using a trigram-RNN language model.

Item Type: Book Chapter
Uncontrolled Keywords: Low-resource language, n-gram language model, Recurrent neural network language model, unimas, university, universiti, Borneo, Malaysia, Sarawak, Kuching, Samarahan, ipta, education, research, Universiti Malaysia Sarawak.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Samson Juan
Date Deposited: 09 Jan 2020 08:30
Last Modified: 09 Jan 2020 08:30
URI: http://ir.unimas.my/id/eprint/28716

Actions (For repository members only: login required)

View Item View Item