Semantic event extraction in unstructured text based on prominence and discourse-level dependencies

Siaw, Nyuk Hiong (2015) Semantic event extraction in unstructured text based on prominence and discourse-level dependencies. PhD thesis, Universiti Malaysia Sarawak, (UNIMAS).

[img] PDF (Please get the password by email to repository@unimas.my , or call ext: 082-583914/3973/3933)
Siaw, NH.pdf
Restricted to Registered users only

Download (4MB)

Abstract

Semantic event extraction has been applied in many natural language processing (NLP) tasks like summarization and text mining. However, not many researches have been carried out to automate multiple event extraction and representation. This has resulted in the limitation of semantically annotated corpus to PropBank, FrameNet and VerbNet for event extraction. These corpus collections can be expanded by having other semantically annotated event corpus added into it. Many event extraction models like EVENT, SEM and LODE have been proposed but these researches stopped at the collection of events. Extending research beyond this collection of event to investigate the interpretation and abstraction of event-based knowledge has not been exploited much. Furthermore, there is a lack of research for key event indexing to identify the relative importance of multiple events in a complex sentence. This indexing can augment successful extracted event-based knowledge as weight. The main objective of this research is to propose a framework that can automate the extraction of semantically relevant key events based on thematic hierarchy and discourse-level dependencies to determine their relationships and relative importance. This has led to the exploration and formulation of designs to: i) capture and annotate multiple semantic events in a semantic representation format. ii) define a linguistically injected model (Linguistic Window Model) to interpret multiple events in a complex sentence. iii) define new weights for graph-based text (based on Linguistic Window Model) for key event indexing. This research has proposed a new method, EveSem, a NLP tools pipeline to automate the extraction and annotation of semantic events. This tool has performed marginally better than TIPSemB-1.0. EveSem is then extended to invent a Linguistic Window Model which has a linguistic structure that is found to enhance the F1-score when compared to ACE data for event extraction. The thematic hierarchy and discourse-level dependencies properties of the linguistic structure have been found to greatly improve the recall over ACE data for "trigger" identification as well. Based on the thematic hierarchy, new weights are defined to construct weighted graph-based text which has shown to improve the indexing of relative importance of key event in complex sentences. The results showed that the NLP tools pipeline has successfully extracted and represented multiple events in XML tags. The small collection of XML annotated corpus for semantic events can be added to the collection of event lexical databases. Furthermore, this approach is domain generic and is portable to be implemented in other languages provided the language has the available NLP tools. The Linguistic Window Model is able to extract event with improve F1-score over ACE task. This model has the advantage over bag of word (BOW) model for key event indexing since it takes into consideration the context of word co-occurrence and semantic association between words based on the linguistic structure of the model. As a conclusion, the objectives of this research have been successfully achieved. The research has addressed the gaps identified in this thesis by: (a) automatically generated a collection of multiple semantic event using a generic approach through NLP tools as a pipeline, (b) identifying relative importance of key semantic events based on linguistic properties of the sentence.

Item Type: Thesis (PhD)
Additional Information: Thesis (Ph.D.) -- Universiti Malaysia Sarawak, 2015.
Uncontrolled Keywords: Semantic computing, unimas, university, universiti, Borneo, Malaysia, Sarawak, Kuching, Samarahan, ipta, education, Postgraduate, research, Universiti Malaysia Sarawak
Subjects: T Technology > T Technology (General)
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology
Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Karen Kornalius
Date Deposited: 03 Mar 2016 04:18
Last Modified: 01 Aug 2023 06:40
URI: http://ir.unimas.my/id/eprint/10767

Actions (For repository members only: login required)

View Item View Item