Malay Language Stemmer

Khan Ullah, Rehman and Fitri Suraya, Mohamad and Muh Inam, Ulhaq and Shahren Ahmad, Zaidi Adruce and Philip Nuli, Anding and Sajjad, Nawaz Khan and Abdulrazak Yahya, Saleh Al-Hababi (2017) Malay Language Stemmer. International Journal for Research in Emerging Science and Technology, 4 (12). pp. 1-9. ISSN 2349-7610

[img] PDF
Malay Language Stemmer - Copy.pdf

Download (948kB)
Official URL: https://ijrest.net/vol-4-issue-12.html

Abstract

Stemmer is a language processing tool that has been widely used in many artificial intelligence applications for removing affixes in a word such as prefixes, infixes, and suffixes to generate the root word. This study designs an algorithm and develops a Malay language stemmer. It is given that most of Malay language stemmers have problems in stemming, as they tended to have dependencies on online dictionaries, which return false results during stemming. It is given that the complexity of affixes in Malay words is higher than that of English words. Therefore, an offline dictionary of 9,512 words is introduced in this study to handle the ambiguity when stemming Malay words. Each step the algorithm first checks the word in the local dictionary as a root word, otherwise process the word. The five steps are stem-extra-suffix, stem-plural, stem-infix, stem-prefix, and stem-suffix. The affixes rules are extracted from Kamus Tatabahasa, and Kamus Dewan (4th Ed) is used to confirm the accuracy of stemmed words. The results show that the proposed stemmer can stem prefixes, suffixes and infixes with high accuracy. The study conclusively illustrated that the proposed stemmer can handle the complexity of Malay words. This stemmer can be further enhanced by a look-up table or dictionary of overlapping words to cover the prefix and suffix overlapping limitation.

Item Type: Article
Uncontrolled Keywords: Stemming, Stemmer, Natural language processing, Algorithm and Morphology, unimas, university, universiti, Borneo, Malaysia, Sarawak, Kuching, Samarahan, ipta, education, research, Universiti Malaysia Sarawak.
Subjects: P Language and Literature > P Philology. Linguistics
Divisions: Academic Faculties, Institutes and Centres > Faculty of Cognitive Sciences and Human Development
Depositing User: Mohamad Hapni Joblie
Date Deposited: 11 Sep 2020 08:02
Last Modified: 11 Sep 2020 08:02
URI: http://ir.unimas.my/id/eprint/31722

Actions (For repository members only: login required)

View Item View Item