Clickbait Classification Using Transformers

  • Mori Hovipah Mori Hovipah Universitas Islam Negeri Sultan Syarif Kasim
  • Elin Hearani Universitas Islam Negeri Sultan Syarif Kasim Riau
  • Jasril Jasril Universitas Islam Negeri Sultan Syarif Kasim Riau
  • Fadhilah Syafria Universitas Islam Negeri Sultan Syarif Kasim Riau
Keywords: News; Clickbait; Headline; Hoax; Transformers

Abstract

Clickbait is a news title created by the author with the aim of attracting the getting to get readers so they Never miss a headline. Clickbait headlines are typically quirky, confusing, and use exaggerated sentences to entice readers to click on links. However, clickbait headlines that look very attractive often do not match the information in the headlines and the content of the news, which can lead to the spread of fake news and hoaxes. Then classification of clickbait news titles is carried out, for this research, clickbait classification was carried out for news titles will be carried out using the Transformers method. The number of news titles used in this study amounted to 6632 news titles. The process of classification of news titles in this study includes: collecting data, labeling data, preprocessing, EDA, and classification using transformers. The best accuracy value obtained in this study was 63% for precision of 0.63 and recall of 1 using a data division of 10%: 90%.

Downloads

Download data is not yet available.

References

[1] B. W. Rauf, S. Raharjo, and H. Sismoro, “Deteksi Clickbait dengan Sentence Scoring Based On Frequency di Detik.Com,” J. Teknol. Inf., vol. 4, no. 2, pp. 247–252, 2020.
[2] A. D. Rendragraha, M. A. Bijaksana, and A. Romadhony, “Pendekatan Metode Transformers untuk Deteksi Bahasa Kasar dalam Komentar Berita Online Indonesia,” e-Proceeding Eng., vol. 8, no. 2, pp. 3385–3395, 2021.
[3] A. F. Yavi, “Klasifikasi Artikel Berbahasa Indonesia untuk Mendeteksi Clickbait menggunakan Metode Naïve Bayes,” J. Chem. Inf. Model., vol. 53, no. 9, pp. 1689–1699, 2018.
[4] Y. D. Hadiyat, “Clickbait on Indonesia Online Media,” J. Pekommas, vol. 4, no. 1, p. 1, 2019.
[5] M. N. Fakhruzzaman, S. Z. Jannah, R. A. Ningrum, and I. Fahmiyah, “Clickbait Headline Detection in Indonesian News Sites using Multilingual Bidirectional Encoder Representations from Transformers (M-BERT),” 2021.
[6] R. Sagita, U. Enri, and A. Primajaya, “Klasifikasi Berita Clickbait Menggunakan K-Nearest Neighbor (KNN),” JOINS (Journal Inf. Syst., vol. 5, no. 2, pp. 230–239, 2020.
[7] V. Indurthi, B. Syed, M. Gupta, and V. Varma, “Predicting Clickbait Strength in Online Social Media,” pp. 4835–4846, 2021.
[8] S. Ram, S. Prasad, and T. Bahadur, “Detecting Clickbaits on Nepali News using SVM and RF,” vol. 8914, pp. 140–146, 2021.
[9] A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, pp. 5999–6009, 2017.
[10] S. Cahyawijaya et al., “IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation,” EMNLP 2021 - 2021 Conf. Empir. Methods Nat. Lang. Process. Proc., pp. 8875–8898, 2021.
[11] A. Anand, T. Chakraborty, and N. Park, “We used neural networks to detect clickbaits: You won’t believe what happened next!,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10193 LNCS, pp. 541–547, 2017.
[12] O. L. Pramesti, “Clickbait Headline in News of Online Prostitution Case,” J. Pekommas, vol. 5, no. 1, p. 59, 2020.
[13] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” pp. 38–45, 2020.
[14] A.- Arini, L. K. Wardhani, and D.- Octaviano, “Perbandingan Seleksi Fitur Term Frequency & Tri-Gram Character Menggunakan Algoritma Naïve Bayes Classifier (Nbc) Pada Tweet Hashtag #2019gantipresiden,” Kilat, vol. 9, no. 1, pp. 103–114, 2020.
[15] P. Nima, “Automatic Filtration of Misleading Youtube Videos using Data Mining Automatic Filtration of Misleading Youtube Videos using Data Mining Techniques MSc Research Project Masters in Data Analytics Prateek Nima Student ID : x18114610 School of Computing Natio,” no. September 2019, 2020.
[16] A. William and Y. Sari, “CLICK-ID: A novel dataset for Indonesian clickbait headlines,” Data Br., vol. 32, p. 106231, 2020.
[17] A. Awalina, F. A. Bachtiar, F. Utaminingrum, and P. Korespondensi, “Perbandingan Pretrained Model Transformer Pada Deteksi Ulasan Palsu Comparison Of Pretrained Transformer Models On Spam Review Detection,” vol. 9, no. 3, pp. 597–604, 2022.
[18] M. Radhi, A. Amalia, D. R. H. Sitompul, S. H. Sinurat, and E. Indra, “Analisis Big Data Dengan Metode Exploratory Data Analysis (Eda) Dan Metode Visualisasi Menggunakan Jupyter Notebook,” J. Sist. Inf. dan Ilmu Komput. Prima(JUSIKOM PRIMA), vol. 4, no. 2, pp. 23–27, 2022.
[1] B. W. Rauf, S. Raharjo, and H. Sismoro, “Deteksi Clickbait dengan Sentence Scoring Based On Frequency di Detik.Com,” J. Teknol. Inf., vol. 4, no. 2, pp. 247–252, 2020.
[2] A. D. Rendragraha, M. A. Bijaksana, and A. Romadhony, “Pendekatan Metode Transformers untuk Deteksi Bahasa Kasar dalam Komentar Berita Online Indonesia,” e-Proceeding Eng., vol. 8, no. 2, pp. 3385–3395, 2021.
[3] A. F. Yavi, “Klasifikasi Artikel Berbahasa Indonesia untuk Mendeteksi Clickbait menggunakan Metode Naïve Bayes,” J. Chem. Inf. Model., vol. 53, no. 9, pp. 1689–1699, 2018.
[4] Y. D. Hadiyat, “Clickbait on Indonesia Online Media,” J. Pekommas, vol. 4, no. 1, p. 1, 2019.
[5] M. N. Fakhruzzaman, S. Z. Jannah, R. A. Ningrum, and I. Fahmiyah, “Clickbait Headline Detection in Indonesian News Sites using Multilingual Bidirectional Encoder Representations from Transformers (M-BERT),” 2021.
[6] R. Sagita, U. Enri, and A. Primajaya, “Klasifikasi Berita Clickbait Menggunakan K-Nearest Neighbor (KNN),” JOINS (Journal Inf. Syst., vol. 5, no. 2, pp. 230–239, 2020.
[7] V. Indurthi, B. Syed, M. Gupta, and V. Varma, “Predicting Clickbait Strength in Online Social Media,” pp. 4835–4846, 2021.
[8] S. Ram, S. Prasad, and T. Bahadur, “Detecting Clickbaits on Nepali News using SVM and RF,” vol. 8914, pp. 140–146, 2021.
[9] A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, pp. 5999–6009, 2017.
[10] S. Cahyawijaya et al., “IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation,” EMNLP 2021 - 2021 Conf. Empir. Methods Nat. Lang. Process. Proc., pp. 8875–8898, 2021.
[11] A. Anand, T. Chakraborty, and N. Park, “We used neural networks to detect clickbaits: You won’t believe what happened next!,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10193 LNCS, pp. 541–547, 2017.
[12] O. L. Pramesti, “Clickbait Headline in News of Online Prostitution Case,” J. Pekommas, vol. 5, no. 1, p. 59, 2020.
[13] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” pp. 38–45, 2020.
[14] A.- Arini, L. K. Wardhani, and D.- Octaviano, “Perbandingan Seleksi Fitur Term Frequency & Tri-Gram Character Menggunakan Algoritma Naïve Bayes Classifier (Nbc) Pada Tweet Hashtag #2019gantipresiden,” Kilat, vol. 9, no. 1, pp. 103–114, 2020.
[15] P. Nima, “Automatic Filtration of Misleading Youtube Videos using Data Mining Automatic Filtration of Misleading Youtube Videos using Data Mining Techniques MSc Research Project Masters in Data Analytics Prateek Nima Student ID : x18114610 School of Computing Natio,” no. September 2019, 2020.
[16] A. William and Y. Sari, “CLICK-ID: A novel dataset for Indonesian clickbait headlines,” Data Br., vol. 32, p. 106231, 2020.
[17] A. Awalina, F. A. Bachtiar, F. Utaminingrum, and P. Korespondensi, “Perbandingan Pretrained Model Transformer Pada Deteksi Ulasan Palsu Comparison Of Pretrained Transformer Models On Spam Review Detection,” vol. 9, no. 3, pp. 597–604, 2022.
[18] M. Radhi, A. Amalia, D. R. H. Sitompul, S. H. Sinurat, and E. Indra, “Analisis Big Data Dengan Metode Exploratory Data Analysis (Eda) Dan Metode Visualisasi Menggunakan Jupyter Notebook,” J. Sist. Inf. dan Ilmu Komput. Prima(JUSIKOM PRIMA), vol. 4, no. 2, pp. 23–27, 2022.
Published
2023-04-30
How to Cite
Mori Hovipah, M. H., Hearani, E., Jasril, J., & Syafria, F. (2023). Clickbait Classification Using Transformers. Jurnal CoSciTech (Computer Science and Information Technology), 4(1), 172-181. https://doi.org/10.37859/coscitech.v4i1.4713
Abstract views: 197 , PDF downloads: 210