Information Theoretic-based Feature Selection for Machine Learning

Muhammad Aliyu, Sulaiman (2018) Information Theoretic-based Feature Selection for Machine Learning. PhD thesis, Universiti Malaysia Sarawak (UNIMAS).

[img] PDF
Information Theoretic-based Feature 24pgs.pdf

Download (864kB)
[img] PDF (Please get the password from ACADEMIC REPOSITORY UNIT, ext: 082-583932/ 082-583914)
Information Theoretic-based Feature ft.pdf
Restricted to Registered users only

Download (4MB)

Abstract

Three major factors that determine the performance of a machine learning are the choice of a representative set of features, choosing a suitable machine learning algorithm and the right selection of the training parameters for a specified machine learning algorithm. This thesis tackles the problem of feature selection for supervised machine learning prediction tasks through dependency information. The feature evaluation strategy is formulated based on mutual information (MI) to handles both classification and regression supervised learning tasks and the search strategy is a modified greedy forward strategy designed to manage redundancy between features and avoiding features that are irrelevant to the predicting output. The problem with many existing feature selections that evaluate features based on mutual information is that they are designed to handles classification tasks only. And the few existing ones that can work for regression tasks were recently found to underestimate mutual information between two strongly dependent variables. In addition to these problems, the search strategy which is usually a heuristic greedy method used with many existing feature selections, lacks scientifically sound stopping criterion and the forward greedy procedure despite its advantages over the backward procedure is found to reveal suboptimal. Thus, this thesis has developed and evaluated a filter based Information Theoretic-based Feature Selection (IFS) for machine learning. Various experiments were carried out to assess and test components of IFS algorithm. The first test was designed to evaluate the formulated IFS Selection Criterion Strategy (MI estimator) by comparing it with six different MI estimator benchmarks. The second test evaluates IFS in a controlled study using simulated datasets. Moreover, the third test used ten natural domain datasets obtained from UCI Repository, in about fifteen different experiments, using three to four different Machine Learning Algorithms for performance evaluation. Also, additional experiments to compare the relative performance of the IFS with five related feature selection algorithms were carried out using natural domain datasets. Besides, this thesis developed a hybrid filter method to enhance the performance of the IFS. IFS served as filter together with an Ant Colony Optimization System (ACO) as a metaheuristic form the hybrid system. In these extended IFS method, feature selection method was defined and presented as a 0-1 Knapsack Problem (MKP). Thus, this thesis precisely developed and evaluated IFS_BACS (Binary Ant Colony System) hybrid method. Further experiments were carried out using the natural domain datasets and comparison were made between IFS and hybrid IFS_BACS methods. In most of the cases, experimental results of IFS and its extended IFS_BACS hybrid method significantly reduced features and produce competitive performance accuracy when compared to the results of the full feature set before applying the IFS or IFS_BACS method. And comparing the IFS with its extended version, the extended version (IFS_BACS) seems to be more promising in selecting optimal feature subset from large datasets.

Item Type: Thesis (PhD)
Additional Information: Thesis (Ph.D) -- Universiti Malaysia Sarawak, 2018.
Uncontrolled Keywords: Feature Selection, Information Theory, Mutual Information, Entropy, Density Estimation, Optimization, Machine Learning Algorithms, Supervised Learning, Modeling, Ant Colony System, unimas, university, universiti, Borneo, Malaysia, Sarawak, Kuching, Samarahan, ipta, education, Postgraduate, research, Universiti Malaysia Sarawak.
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: Gani
Date Deposited: 27 Aug 2019 08:43
Last Modified: 27 Aug 2019 08:50
URI: http://ir.unimas.my/id/eprint/26595

Actions (For repository members only: login required)

View Item View Item