Nursyahirah, Tarmizi and Suhaila, Saee and Dayang Hanani, Abang Ibrahim (2020) Author Identification : Performance Comparison using English and Under-Resourced Languages. Journal of Physics: Conference Series, 1529 (052057). pp. 1-12. ISSN 1742-6596
![]() |
PDF
Author Identification.pdf Download (1MB) |
Abstract
This paper presents Author Identification (AI) task using different language which are English and Under-Resourced Languages (U-RL) (i.e. KadazanDusun and Iban). In this paper, the performance of AI task is analysed using English and the U-RL datasets in terms of accuracy. Different stylometric features and emerging machine learning algorithms (i.e. SVM and Random Forest) are examined to obtain optimal results in AI task. The approach used in AI task is based on supervised machine learning. Cross-validation is used to evaluate the performance of AI task. The findings include the performance comparison of different stylometric feature and classifiers between the three datasets based on their accuracy values. The combination of word n-grams with character 3-grams achieved the highest accuracy with almost 75% using English dataset. For classifier, SVM gained better result for all three datasets compared to Random Forest.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Author Identification (AI), Under-Resourced Languages (U-RL), machine learning, English. |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology |
Depositing User: | Gani |
Date Deposited: | 12 Mar 2025 02:00 |
Last Modified: | 12 Mar 2025 02:00 |
URI: | http://ir.unimas.my/id/eprint/47756 |
Actions (For repository members only: login required)
![]() |
View Item |