Author identification for under-resourced language Kadazandusun

Nursyahirah, Tarmizi and Suhaila, Saee and Dayang Hanani, Abang Ibrahim (2020) Author identification for under-resourced language Kadazandusun. Indonesian Journal of Electrical Engineering and Computer Science, 17 (1). pp. 248-255. ISSN 2502-4760

[img] PDF
Author.pdf

Download (378kB)
Official URL: https://ijeecs.iaescore.com/index.php/IJEECS/artic...

Abstract

This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia. The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun. Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features. Stylometric features are used to quantify the writing styles of the authors which includes character n-grams and word n-grams. The workflow of Author Identification implements the machine learning approach to solve the single-labelled multi-class problem and predict the author of a given message in KadazanDusun. Two classifiers are used to compare the accuracy including Naïve Bayes and Support Vector Machine (SVM). The results show that the combination of n-grams which is word-level unigram and {1-5}-grams with character 3-grams are the most relevant stylometric features in identifying the author of KadazanDusun message with an accuracy of 80.17%. The results also show that SVM classifier has outperformed Naive Bayes in this Author Identification task with the accuracy of 80.17%.

Item Type: Article
Uncontrolled Keywords: Author identification, Kadazan dusun, Machine learning, Stylometry, Under-resourced language.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology
Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Gani
Date Deposited: 12 Mar 2025 02:29
Last Modified: 12 Mar 2025 02:29
URI: http://ir.unimas.my/id/eprint/47757

Actions (For repository members only: login required)

View Item View Item