Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection

Tan, Colin Choon Lin (2021) Leveraging Ensemble Strategies in Identity Verification and Feature Optimisation for Phishing Website Detection. PhD thesis, Universiti Malaysia Sarawak (UNIMAS).

[img] PDF (Please get the password by email to repository@unimas.my , or call ext: 082-583914/3973/3933)
Colin.pdf
Restricted to Registered users only

Download (18MB) | Request a copy

Abstract

The aim of this thesis is to enrich the ongoing efforts of protecting Internet users against phishing attacks. Mainstream solutions and technical approaches for phishing detection suffer from inherent problems such as ineffectiveness against newly launched phishing webpages, misclassification of legitimate webpages, utilisation of irrelevant features, and susceptibility to intentional manipulation by adversaries. In this study, we explore whether ensemble strategies can be leveraged in website identity verification and feature optimisation to address the limitations of existing techniques. This study intends to provide a deeper understanding on the progressive state of phishing and identify potential directions where phishing detection measures should be concentrated. Through the proposal of an improved website logo extraction technique, we showed that the ensemble of visual and textual identities has led to a promising detection accuracy of 98.6%. The misclassification rate of legitimate webpages has also improved by 3.4%, which is consistent with our aim of attaining robustness over legitimate webpages with varying properties that users routinely encounter. To facilitate the identification of essential features for phishing detection, we propose a novel ensemble feature selection framework, which achieved a competitive detection accuracy of 94.6% using only 20.8% of the original number of features. Based on experimental results, we also challenged the utilisation of certain conventional features that are often highly rated and falsely assumed to be effective. Lastly, we showed that the underlying phishing patterns at the webpage interconnection level can be exploited using ensemble strategies in a graph-theoretic approach, achieving up to 97.8% of accuracy while demonstrating robustness and immutability against current and emerging phishing schemes.

Item Type: Thesis (PhD)
Additional Information: Thesis (PhD.) - Universiti Malaysia Sarawak , 2021.
Uncontrolled Keywords: Phishing, web security, ensemble identities, feature selection, graph features
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology
Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology
Depositing User: COLIN TAN CHOON LIN
Date Deposited: 15 Dec 2021 00:58
Last Modified: 17 Aug 2023 07:44
URI: http://ir.unimas.my/id/eprint/37167

Actions (For repository members only: login required)

View Item View Item