Measuring the Feasibility of a Question and Answering System for the Sarawak Gazette Using Chatbot Technology

Yasir Lutfan, Yusuf and Suhaila, Saee (2025) Measuring the Feasibility of a Question and Answering System for the Sarawak Gazette Using Chatbot Technology. Acta Informatica Pragensia, 14 (3). pp. 1-28. ISSN 1805-4951

[img] PDF
Measuring the Feasibility.pdf

Download (1MB)
Official URL: https://aip.vse.cz/corproof.php?tartkey=aip-000000...

Abstract

Background: The Sarawak Gazette is a critical repository of information pertaining to Sarawak’s history. It has received much attention over the last two decades, with prior studies focusing on digitizing and extracting the gazette’s ontologies to increase the gazette’s accessibility. However, the creation of a question answering system for the Sarawak Gazette, another avenue that could improve accessibility, has been overlooked. Objective: This study created a new system to generate answers for user questions related to the gazette using chatbot technology. Methods: This system sends user queries to a context retrieval system, then generates an answer from the retrieved contexts using a Large Language Model. A question answering dataset was also created using a Large Language Model to evaluate this system, with dataset quality assessed by 10 annotators. Results: The system achieved 55% higher precision, and 42% higher recall compared to previous state-of-the-art historical document question answering while only sacrificing 11% of cosine similarity. The annotators overall rated the dataset 2.9 out of 3. Conclusion: The system could answer the general public’s questions about the Sarawak Gazette in a more direct and friendly manner compared to traditional information retrieval methods. The methods developed in this study are also applicable to other Malaysian historical texts that are written in English. All code used in this study have been released on GitHub.

Item Type: Article
Uncontrolled Keywords: Historical documents; Old newspapers; Accessibility; Question answering; Artificial intelligence; Retrieval augmented generation; LangChain.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology
Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Gani
Date Deposited: 12 Mar 2025 01:20
Last Modified: 12 Mar 2025 01:20
URI: http://ir.unimas.my/id/eprint/47755

Actions (For repository members only: login required)

View Item View Item