Deteksi Email Spam dengan Continuous Bag-Of-Words dan Random Forest

Michiavelly  Rustam; Agung  Brotokuncoro; Rusdianto  Roestam

doi:10.38035/rrj.v6i4.873

Vol. 6 No. 4 (2024): Ranah Research : Journal Of Multidisciplinary Research and Development (Mei 2024 - Juni 2024)

Download

PDF

Statistic

Downloads

Download data is not yet available.

Michiavelly Rustam

Master of Science and Information Technology, President University 17550, Indonesia

Agung Brotokuncoro

Master of Science and Information Technology, President University 17550, Indonesia

Rusdianto Roestam

Doctor of Philosophy, President University 17550, Indonesia

Published

Jun 5, 2024

DOI

https://doi.org/10.38035/rrj.v6i4.873

Abstract

Spam email poses a significant cyber threat, as scammers employ various tactics to deceive individuals into divulging sensitive information or downloading harmful content. For instance, in June 2023, Indonesia encountered approximately 6.51 thousand spam attacks, underscoring the widespread nature of this issue. These attacks frequently involve deceptive strategies, such as impersonation or false promises of rewards, to ensnare unsuspecting victims. Succumbing to spam can result in financial losses and other grave repercussions. To address this concern, this research addresses this pressing problem by focusing on email content classification to detect phishing attempts. The proposed solution leverages runtime platforms such as Google Colab and uses Continuous Bag of Words (CBOW) analysis and Random Forest methods. CBOW is selected for its effectiveness in capturing semantic relationships between words, allowing the model to extract meaningful features from the email content. Random Forest, on the other hand, is chosen for its ability to handle imbalanced datasets commonly encountered in email classification tasks, ensuring fair representation of both spam and ham emails during model training. By combining these two techniques, we aim to develop a robust classification model capable of accurately distinguishing between phishing (spam) and legitimate (ham) emails, thus enhancing email security measures. Through our approach, we aim to classify the SpamAssassin dataset into ham or spam categories, with an anticipated precision rate of 0.98, demonstrating the model's effectiveness in accurately identifying phishing emails.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Hak Cipta :

Penulis yang mempublikasikan manuskripnya di jurnal ini menyetujui ketentuan berikut:

Hak cipta pada setiap artikel adalah milik penulis.
Penulis mengakui bahwa Ranah Research : Journal of Multidisciplinary Research and Development berhak menjadi yang pertama menerbitkan dengan lisensi Creative Commons Attribution 4.0 International (Attribution 4.0 International CC BY 4.0) .
Penulis dapat mengirimkan artikel secara terpisah, mengatur distribusi non-eksklusif manuskrip yang telah diterbitkan dalam jurnal ini ke versi lain (misalnya, dikirim ke repositori institusi penulis, publikasi ke dalam buku, dll.), dengan mengakui bahwa manuskrip telah diterbitkan pertama kali di Ranah Research.

How to Cite

Rustam, M., Brotokuncoro, A. and Roestam, R. (2024) “Deteksi Email Spam dengan Continuous Bag-Of-Words dan Random Forest”, Ranah Research : Journal of Multidisciplinary Research and Development, 6(4), pp. 758-765. doi: 10.38035/rrj.v6i4.873.

References

Agarwal, R., et al. (2019). "Addressing the Persistent Threat of Spam: Challenges and Solutions." Communications of the ACM, 62(8), 70-78.
Christanto, B., et al. (2020). "Evaluation of Random Forest and Naive Bayes for Spam Classification." Journal of Information Security, 8(3), 101-110.
Dada, A., et al. (2023). "Effectiveness of Random Forests in Spam Detection: A Case Study." Proceedings of the International Symposium on Security and Privacy, 145-152.
Gupta, P., et al. (2024). "Novel Approaches to Combat Email Spam: A Survey." International Journal of Information Security, 12(3), 201-210
Hidayatullah, A., et al. (2018). "A Comprehensive Comparison of Spam Classification Algorithms: Random Forest Classifier, Adaptive Boosting, and Gradient Boosting Classifier." International Journal of Computer Applications, 181(39), 12-18.
Husin, F., et al. (2023). "BERT Algorithm for Spam Classification: A Comparative Study." Journal of Machine Learning Research, 17(5), 224-235.
Li, Y., et al. (2020). "Advancements in Spam Classification Techniques: A Review." IEEE Transactions on Information Forensics and Security, 15(6), 1400-1412.
Rayan, S., et al. (2021). "NLP-RF: Integrating Natural Language Processing with Random Forests for Spam Detection." Proceedings of the International Conference on Artificial Intelligence, 72-79.
Wang, S., et al. (2023). "Improving Email Content Classification: Insights from Recent Research." ACM Transactions on Internet Technology, 18(4), 52-61.
Zhang, J., et al. (2022). "Enhancing Email Security Through Advanced Classification Techniques." Journal of Cybersecurity, 7(2), 89-97.

e-ISSN 2655-0865

##plugins.themes.academic_pro.article.sidebar##

Downloads

##plugins.themes.academic_pro.article.main##

Abstract

##plugins.themes.academic_pro.article.details##

References