Computational Morphological Resources Management System

Jovianna, Juk (2014) Computational Morphological Resources Management System. [Final Year Project Report] (Unpublished)

[img] PDF
JOVIANNA (24 pgs).pdf

Download (15MB)
[img] PDF (Please get the password from Technical & Digitization Management Unit, ext: 082-583913/ 082-583914)
JOVIANNA (fulltext).pdf
Restricted to Registered users only

Download (48MB)

Abstract

In Natural Language Processing (NLP), morphological analyser is one of a very basic processing tool that we need to have. It is because with the help of the morphological analyser a word structure could be studied. In order to analyse a word structure, morphological resources is a very crucial input for the morphological analyser. Currently, the acquisition of morphological resources is done manually which consumes a lot of energy and time. Therefore, we proposed Computational Morphological Resources Management System (CMRMS), a management system that will ease the linguist when undergoing the pre-processing part. Besides, CMRMS would allow the linguist to induce morphological information from the obtained wordlist. Therefore, to overcome the time and energy consuming problem an automated way is developed. The automated way combines the manual pre-processing and automatic file management system as the solution to obtain a wordlist and segmented data. The automated system, CMRMS has three main modules which are tokenization, conversion and segmentation tools module. . The tokenization module will tokenize any text file data which is obtain from hardcopy data, softcopy data and existing data into word by word. The conversion module would convert two types of softcopy data which is a pdf file and html file. Lastly, the segmentation tools module will provide two segmentation tools called Linguistica and Morfessor to analyse the data which have been tokenized. In order to test the functionality of CMRMS, three types of testing was implemented which are system, component and integration testing. Each of the testing gave a good result as the result shows CMRMS able to obtain the acquired result. This system has helped the linguist to manage their time more efficiently since they do not have to undergo the pre-processing part manual. Using CMRMS, they can obtain the wordlist easily. Beside, the produced wordlist can be re-used again as the input for other segmentation process.

Item Type: Final Year Project Report
Additional Information: Project Report (B.Sc.) -- Universiti Malaysia Sarawak, 2014.
Uncontrolled Keywords: Computational Morphological Resources Management System (CMRMS), Natural Language Processing (NLP), morphological analyser, word structure
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Unai
Date Deposited: 24 Aug 2022 08:46
Last Modified: 24 Aug 2022 08:46
URI: http://ir.unimas.my/id/eprint/39301

Actions (For repository members only: login required)

View Item View Item