Feature Selection Based on Semantics

Chua, Stephanie and Kulathuramaiyer, Narayanan (2008) Feature Selection Based on Semantics. Innovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering. pp. 471-476.

[img] PDF
feature_selection_based_on_semantics.pdf
Restricted to Registered users only

Download (375kB)

Abstract

The need for an automated text categorization system is spurred on by the extensive increase of digital documents. This paper looks into feature selection, one of the main processes in text categorization. The feature selection approach is based on semantics by employing WordNet [1]. The proposed WordNet-based feature selection approach makes use of synonymous nouns and dominant senses in selecting terms that are reflective of a category’s content. Experiments are carried out using the top ten most populated categories of the Reuters-21578 dataset. Results have shown that statistical feature selection approaches, Chi-Square and Information Gain, are able to produce better results when used with the WordNet-based feature selection approach. The use of the WordNet-based feature selection approach with statistical weighting results in a set of terms that is more meaningful compared to the terms chosen by the statistical approaches. In addition, there is also an effective dimensionality reduction of the feature space when the WordNet-based feature selection method is used.

Item Type: Article
Uncontrolled Keywords: feature selection, Universiti Malaysia Sarawak, UNIMAS, research, IPTA, education, kuching, samarahan, sarawak, malaysia, universiti, university
Subjects: L Education > LB Theory and practice of education
T Technology > T Technology (General)
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Karen Kornalius
Date Deposited: 13 Jan 2014 03:49
Last Modified: 23 Mar 2015 08:11
URI: http://ir.unimas.my/id/eprint/531

Actions (For repository members only: login required)

View Item View Item