Author Identification : Performance Comparison using English and Under-Resourced Languages

Nursyahirah, Tarmizi and Suhaila, Saee and Dayang Hanani, Abang Ibrahim (2020) Author Identification : Performance Comparison using English and Under-Resourced Languages. Journal of Physics: Conference Series, 1529 (052057). pp. 1-12. ISSN 1742-6596

[img] PDF
Author Identification.pdf

Download (1MB)
Official URL: https://iopscience.iop.org/article/10.1088/1742-65...

Abstract

This paper presents Author Identification (AI) task using different language which are English and Under-Resourced Languages (U-RL) (i.e. KadazanDusun and Iban). In this paper, the performance of AI task is analysed using English and the U-RL datasets in terms of accuracy. Different stylometric features and emerging machine learning algorithms (i.e. SVM and Random Forest) are examined to obtain optimal results in AI task. The approach used in AI task is based on supervised machine learning. Cross-validation is used to evaluate the performance of AI task. The findings include the performance comparison of different stylometric feature and classifiers between the three datasets based on their accuracy values. The combination of word n-grams with character 3-grams achieved the highest accuracy with almost 75% using English dataset. For classifier, SVM gained better result for all three datasets compared to Random Forest.

Item Type: Article
Uncontrolled Keywords: Author Identification (AI), Under-Resourced Languages (U-RL), machine learning, English.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology
Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Gani
Date Deposited: 12 Mar 2025 02:00
Last Modified: 12 Mar 2025 02:00
URI: http://ir.unimas.my/id/eprint/47756

Actions (For repository members only: login required)

View Item View Item