Improving spam email detection using hybrid feature selection and sequential minimal optimisation

Published on Jul 1, 2020in Indonesian Journal of Electrical Engineering and Computer Science
· DOI :10.11591/IJEECS.V19.I1.PP535-542
Ahmed Al-Ajeli (University of Babylon), Ahmed Al-Ajeli2
Estimated H-index: 2
(University of Babylon)
+ 0 AuthorsEman S. Al-Shamery1
Estimated H-index: 1
(University of Babylon)
Communication by email is counted as a popular manner through which users can exchange information. The email could be abused by spammers to spread suspicious content to the Internet users. Thus, the need to an effective way to detect spam emails are becoming clear to keep this information safe from malicious access. Many methods have been developed to address such a problem. In this paper, a machine learning technique is applied to detect spam emails. In this technique, a detection system based on sequential minimal optimization (SMO) is built to classify emails into two categories: spam and non-spam (ham). Each email is represented by a set of features extracted from its textual content. A hybrid feature selection is developed to choose a subset of these features based on their importance in process of the detection. This subset is then input into the SMO algorithm to make the detection decision. The use of such a technique provides an efficient protective mechanism to control spams. The experimental results show that the performance of the proposed method is promising compared with the existing methods.
📖 Papers frequently viewed together
8 Citations
1 Author (Kamini Bajaj)
3 Authors (R. Kishore Kumar, ..., P. Sudhakar)
42 Citations
#1A. Adeleke (UTHM: Universiti Tun Hussein Onn Malaysia)H-Index: 3
#2Noor Azah Samsudin (UTHM: Universiti Tun Hussein Onn Malaysia)H-Index: 6
Last. S. K. Ahmad Khalid (UTHM: Universiti Tun Hussein Onn Malaysia)H-Index: 1
view all 4 authors...
Feature selection is an integral phase in text classification problems. It is primarily applied in preprocessing text data prior to labeling. However, there exist some limitations with the FS techniques. The filter-based FS techniques have the drawback of lower accuracy performance while the wrapper-based techniques are highly computationally expensive to process. In this paper, a two-step FS method is presented. In the first step, chisquare (CH) filter-based technique is used to reduce the dime...
4 CitationsSource
#1Nur’Ain Maulat Samsudin (UTHM: Universiti Tun Hussein Onn Malaysia)H-Index: 1
#2Cik Feresa Mohd Foozy (UTHM: Universiti Tun Hussein Onn Malaysia)H-Index: 4
Last. Wan Isni Sofiah Wan Din (Universiti Malaysia Pahang)H-Index: 4
view all 6 authors...
YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link. Spam has its own features that can be analyzed and detected by classification. Hence, enhancement features are proposed to detect YouTube spam. In order to conduct the experiments, a YouTube Spam de...
6 CitationsSource
#1Emmanuel Gbenga Dada (University of Maiduguri)H-Index: 3
#2Joseph Stephen Bassi (University of Maiduguri)H-Index: 4
Last. Opeyemi Emmanuel Ajibuwa (University of Ilorin)H-Index: 1
view all 6 authors...
Abstract The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary...
76 CitationsSource
Intrusion Detection is an important aspect to secure the computing systems from different intrusions. To improve the accuracy and to reduce the computational time, this paper proposes a two-phase hybrid method based on the SVM and RNN. In addition, this paper also had a proposal to obtain a few sets of features with a feature selection technique in which the detection performance increases. For the two-phase system, two different feature selection techniques were proposed which solves both the l...
5 CitationsSource
#1Lanlan KangH-Index: 2
#2Ruey-Shun ChenH-Index: 2
Last. Wenliang CaoH-Index: 2
view all 4 authors...
The World Talk Corporation estimates that over 60 million business people use e-mail. Many more use e-mail purely on a personal basis and the pool of e-mail users is growing daily. And yet, automated techniques for learning to filter e-mail have yet to significantly affect the e-mail market. Here, I attack problems that plague practical e-mail filtering and suggest solutions that will bring us closer to the acceptance of using automated classification techniques to filter personal e-mail. I also...
1 CitationsSource
#1V Vishagini (Amrita Vishwa Vidyapeetham)H-Index: 1
#2Archana K Rajan (Amrita Vishwa Vidyapeetham)H-Index: 4
Email is the most admired method of exchanging messages using the Internet. One of the intimidations to email users is to detect the spam they receive. This can be addressed using different detection and filtering techniques. Machine learning algorithms, especially Support Vector Machine (SVM), can play vital role in spam detection. We propose the use of weighted SVM for spam filtering using weight variables obtained by KFCM algorithm. The weight variables reflect the importance of different cla...
6 CitationsSource
#1Jie Cai (Hunan University)H-Index: 5
#2Jiawei Luo (Hunan University)H-Index: 18
Last. Sheng Yang (Hunan University)H-Index: 2
view all 4 authors...
Abstract High-dimensional data analysis is a challenge for researchers and engineers in the fields of machine learning and data mining. Feature selection provides an effective way to solve this problem by removing irrelevant and redundant data, which can reduce computation time, improve learning accuracy, and facilitate a better understanding for the learning model or data. In this study, we discuss several frequently-used evaluation measures for feature selection, and then survey supervised, un...
381 CitationsSource
Oct 26, 2015 in FPS (Foundations and Practice of Security)
#1Mina Sheikhalishahi (Laval University)H-Index: 8
#2Andrea SaracinoH-Index: 13
Last. Fabio MartinelliH-Index: 28
view all 5 authors...
Spam emails yearly impose extremely heavy costs in terms of time, storage space and money to both private users and companies. Finding and persecuting spammers and eventual spam emails stakeholders should allow to directly tackle the root of the problem. To facilitate such a difficult analysis, which should be performed on large amounts of unclassified raw emails, in this paper we propose a framework to fast and effectively divide large amount of spam emails into homogeneous campaigns through st...
14 CitationsSource
3 CitationsSource
Apr 2, 2015 in ICC (International Conference on Communications)
#1Sunil B. Rathod (North Maharashtra University)H-Index: 2
#2Tareek M. Pattewar (North Maharashtra University)H-Index: 4
Internet provides Emails as means of data communication. Email messaging is an essential contribution. Hacking attacks, phishing attacks and malicious attack are frequently undergo email services to attempt fraud and deception motivation. They use emails to obtain personal credentials of user for financial gain. Emails with genuine content may include phishing URLs for stealing of useful data such kind of emails are nothing but a spam. In order to detect and filter such kind of emails. Bayesian ...
14 CitationsSource
Cited By1
#1Mete YağanoğluH-Index: 1
#2Erdal IrmakH-Index: 1
Teknolojik gelismeler, bireyleri ve kuruluslari, iletisim kurmak ve bilgi paylasmak icin e-postalara daha bagimli hale getirmektedir. E-postalarin internet uzerinden onemli ve populer bir iletisim olarak artan kullanimi, Internet’i ve toplumu etkileyen ciddi bir tehdit olusturmaktadir. Spam epostalar internet kullanicilari icin guvenlik sorunlarina sebep olmaktadir ve depolama, bant genisligi ve uretkenlik acisindan kaynaklari bosa harcamaktadir. Istenmeyen e-postalarin hacmindeki artis, daha gu...
1 CitationsSource