Nurellezia, Suleiman (2023) Sentiment Analysis of Sexual Harassment in Malaysia on Twitter Using Machine Learning Algorithms. [Final Year Project Report] (Unpublished)
PDF (Please get the password by email to repository@unimas.my , or call ext: 082-583914/3973/3933)
NURELLEZIA ft.pdf Restricted to Registered users only Download (2MB) |
Abstract
Twitter is one of the most compelling platforms in social media that fosters meaningful interactions and connections between its users from all walks of life. Since it is a platform that supports freedom of speech, a lot of opinions can be analyzed such as sentiment analysis of sexual harassment. While various studies have explored sentiment analysis of sexual harassment, none of it is locally analyzed based on Malaysia region only. Hence, in this report, tweets regarding sexual harassment in Malaysia are extracted using a list of keywords by semi�autonomous data annotation to get the correct labelling of the data. The labelling of the data is designed such that positive label, is comprehended as tweets that expressed support towards victims, meanwhile negative sentiments are regarded as tweets that sexually harass other Twitter users or neutral sentiments, such as unbiased opinions and statements. The data then underwent stages of data preparation such as data preprocessing and feature extraction to facilitate further data transformation. The transformed data is then modelled using machine learning algorithms such as Naïve Bayes classifier and Support Vector Machine to predict the overall sentiment of tweets, in which the finding depicted an overall positive sentiment surrounding the issue. The classifiers are validated and evaluated using accuracy, precision, recall, and f1-score along with 10-folds cross validation. From the evaluation, Naïve Bayes classifier with unigram features along with Laplace (Alpha) Smoothing Parameter achieves the best results performance in this study. Lastly, the data are visualized using graphs and charts and an overall visualization dashboard is generated for data reporting which helps to analyze and extract meaning behind the sentiments by relating the visualizations with real-life events from legitimate sources such as news and articles. However, this study has some limitations. Although the dataset was carefully selected, the difficulty of identifying subtle moods in tweets and the changing nature of language may have an impact on how well the model performs. Additionally, the scope of this study focused solely on Malaysia, limiting its generalizability to other regions. Future adjustments could strengthen the study's robustness based on the limitations found. To better capture contextual complexity and variances in sentiment expression, constructing additional Malaysian lexicon libraries and a tweet crawler with more trustworthy Malay dictionaries could be the next directions.
Item Type: | Final Year Project Report |
---|---|
Additional Information: | Project report (B.Sc.) -- Universiti Malaysia Sarawak, 2023. |
Uncontrolled Keywords: | Twitter, interactions, Sentiment Analysis of Sexual Harassment, Malaysia |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Divisions: | Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology Faculties, Institutes, Centres > Faculty of Computer Science and Information Technology Academic Faculties, Institutes and Centres > Faculty of Computer Science and Information Technology |
Depositing User: | Unai |
Date Deposited: | 17 Jan 2024 07:37 |
Last Modified: | 17 Jan 2024 07:37 |
URI: | http://ir.unimas.my/id/eprint/44181 |
Actions (For repository members only: login required)
View Item |