Ensemble Machine Learning to Detect Sarcasm in English on Twitter Social Media
Main Article Content
Abstract
Detecting sarcasm in English tweets on social media platforms like Twitter is a complex task due to its subtle and ambiguous nature. This study explores the use of ensemble machine learning techniques, including Logistic Regression, Naive Bayes, Decision Tree, and Support Vector Machine (SVM), to effectively identify sarcasm. A dataset containing sarcastic and non-sarcastic English tweets was collected and preprocessed. Features representing lexical, syntactic, and semantic information were extracted to train and evaluate the ensemble models. The Support Vector Machine method demonstrated the highest performance among the techniques employed, achieving an accuracy of 80% and an F1-score of 80% for sarcasm detection. This highlights the efficacy of Support Vector Machines in capturing complex patterns and differentiating between sarcastic and non-sarcastic tweets. By leveraging the strengths of multiple machine learning algorithms, the ensemble approach enhances the overall performance of the sarcasm detection system. It provides a more robust and accurate detection of sarcasm, thereby improving the understanding of user sentiments and opinions in online conversations. This research contributes to sentiment analysis and natural language processing, offering valuable insights into sarcasm detection in social media. The findings have practical implications for interpreting user-generated content on platforms like Twitter, enabling a better understanding of user sentiments and facilitating more meaningful interactions.
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright Notice
Authors who publish with Journal of Informatics, Information System, Software Engineering and Applications (INISTA) agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
References
[2] Y. Yunitasari, A. Musdholifah, and A. K. Sari, “Sarcasm Detection For Sentiment Analysis in Indonesian Tweets,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 13, no. 1, p. 53, 2019, doi: 10.22146/ijccs.41136.
[3] A. Muhaddisi, B. N. Prastowo, D. Utami, and K. Putri, “Sentiment Analysis With Sarcasm Detection On Politician ’ s Instagram,” vol. 15, no. 4, pp. 349–358, 2021.
[4] V. Govindan and V. Balakrishnan, “A machine learning approach in analyzing the effect of hyperboles using negative sentiment tweets for sarcasm detection,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 8, pp. 5110–5120, 2022, doi: 10.1016/j.jksuci.2022.01.008.
[5] F. Ugm and F. Ugm, “Analisis Sentimen Twitter untuk Teks Berbahasa Indonesia dengan Maximum Entropy dan Support Vector Machine,” vol. 8, no. 1, pp. 91–100, 2014.
[6] A. F. Hidayatullah et al., “Analisis sentimen dan klasifikasi kategori terhadap tokoh publik pada twitter,” vol. 2014, no. semnasIF, pp. 115–122, 2014.
[7] P. Arsi and R. Waluyo, “Analisis Sentimen Wacana Pemindahan Ibu Kota Indonesia Menggunakan Algoritma Support Vector Machine (SVM),” J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 1, p. 147, 2021, doi: 10.25126/jtiik.0813944.
[8] A. Syahadati, N. C. Lengkong, O. Safitri, S. Machsus, Y. R. Putra, and R. Nooraeni, “ANALISIS SENTIMEN PENERAPAN PSBB DI DKI JAKARTA DAN DAMPAKNYA TERHADAP PERGERAKAN IHSG,” vol. 15, no. 1, pp. 20–25, 2021.
[9] M. Shandy, T. Putra, and Y. Azhar, “Perbandingan Model Logistic Regression dan Artificial Neural Network pada Prediksi Pembatalan Hotel,” vol. 6, no. 1, pp. 29–37, 2021.
[10] R. Rahmanda and D. S. Informasi, “Rancang bangun aplikasi berbasis microservice untuk klasifikasi sentimen. studi kasus: pt. yesboss group indonesia (kata.ai),” 2018.
[11] A. Setiawan, L. W. Santoso, R. Adipranata, U. K. Petra, and J. Siwalankerto, “Klasifikasi Artikel Berita Bahasa Indonesia Dengan Naive Bayes Classifier,” pp. 3–8.
[12] U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19 Menggunakan Metode Naïve Bayes,” vol. 5, pp. 157–163, 2021, doi: 10.30865/mib.v5i1.2604.
[13] A. Subekti, “Analisis Sentiment pada Ulasan Film Dengan Optimasi Ensemble Learning,” vol. 7, no. 1, pp. 5–8, 2020.
[14] M. Ma, A. Prayogo, P. Subarkah, and F. Nida, “Sentiment analysis of customer satisfaction levels on smartphone products using Ensemble Learning,” vol. 14, no. 3, pp. 339–347, 2022.
[15] J. Nasional, S. Informasi, M. Kamil, T. Endra, and E. Tju, “Naïve Bayes dan Confusion Matrix untuk Efisiensi Analisa Intrusion Detection System Alert,” vol. 02, pp. 81–88, 2022.