Ensemble Framework for Motif Discovery Based on Data Partitioning

Choong, Allen Chieng Hoon (2020) Ensemble Framework for Motif Discovery Based on Data Partitioning. PhD thesis, Universiti Malaysia Sarawak (UNIMAS).

[img] PDF
Ensemble Framework for Motif Discovery Based on Data Partitioning - 24 pgs.pdf

Download (508kB)
[img] PDF (Please get the password by email to repository@unimas.my, or call ext: 3914/ 3942/ 3933)
Allen Choong Chieng Hoon ft.pdf
Restricted to Registered users only

Download (2MB) | Request a copy

Abstract

Computational DNA motif prediction is a challenging problem because motifs are short, degenerated, and are associated with ill-defined features. With the advances of genome-wide ChIP analysis technology, computational motif discovery tools are necessary to effectively tackle the large-scale datasets for motifs search. Ensemble of DNA motif discovery methods is one of the most successful approaches for motif discovery. Nevertheless, most of the existing works cannot perform motif searches in ChIP datasets because of the limited input sizes of the classical tools employed in the ensemble. Ensemble approach not only uses the results from the classical motif discovery tools, it also combines the discovered results to produce better results. The merging algorithm contributes to the prediction accuracy of the discovered motifs. The primary contribution of this thesis work is the development of an ensemble method called ENSPART with the novelty of using data partitioning technique on ChIP dataset for DNA motif prediction. The idea is to reduce the search space by portioning the input datasets into subsets and tackle by ensemble of classical motif discovery tools separately. Then, using a proposed merging algorithm, the candidate motifs are merged regardless the different lengths. Three experiments are conducted. ChIP datasets have been downloaded to evaluate the performances of the ENSPART with Receiver Operative Curves and Area Under Curve performance metrics. ENSPART was compared with the genome-wide motif discovery tools MEME-ChIP, ChIPMunk, and RSAT peak-motifs using partitioning technique. The results demonstrate that ENSPART performed significantly better than MEME-ChIP and RSAT peak-motifs in terms of the two performance metrics. Another set of datasets are gathered and sampled without partitioning. ENSPART is compared to its employed classifiers: AMD, BioProspector, MDscan, MEME-ChIP, MotifSampler, and Weeder 2. ENSPART is also compared to MEME-ChIP, ChIPMunk, and RSAT peak-motifs without partitioning. The results show that ENSPART produces significantly better results than its individual classifiers and also MEME-ChIP, ChIPMunk, and RSAT peak-motifs. Finally, an experiment on the simulated datasets is conducted. ENSPART is compared to GimmeMotifs and MotifVoter which both are also ensemble-based tools. The results show that ENSPART produce significantly higher precision and recall rates than GimmeMotifs and MotifVoter. In conclusion, the ensemble technique is effective for DNA motif prediction, while the ChIP dataset can be tackled effectively using data partitioning techniques. The developed merging technique in ENSPART allows effective merging of same motifs from different data partitions. Such methods are generally applicable to any ensemble techniques that utilised classical motif discovery tools, or more recently, ChIP analysis tools.

Item Type: Thesis (PhD)
Additional Information: Thesis (PhD.) - Universiti Malaysia Sarawak , 2020.
Uncontrolled Keywords: DNA motif discovery, ensemble method, data partitioning, unimas, university, universiti, Borneo, Malaysia, Sarawak, Kuching, Samarahan, ipta, education , Postgraduate, research, Universiti Malaysia Sarawak.
Subjects: Q Science > Q Science (General)
Divisions: Academic Faculties, Institutes and Centres > Faculty of Cognitive Sciences and Human Development
Faculties, Institutes, Centres > Faculty of Cognitive Sciences and Human Development
Academic Faculties, Institutes and Centres > Faculty of Cognitive Sciences and Human Development
Depositing User: ALLEN CHOONG CHIENG HOON
Date Deposited: 14 Sep 2020 00:06
Last Modified: 18 Apr 2023 02:30
URI: http://ir.unimas.my/id/eprint/31761

Actions (For repository members only: login required)

View Item View Item