CICIDS-2017 Dataset Feature Analysis With Information Gain for Anomaly Detection

Published on Jul 16, 2020in IEEE Access3.367
· DOI :10.1109/ACCESS.2020.3009843
Estimated H-index: 2
Deris Stiawan10
Estimated H-index: 10
+ 3 AuthorsRahmat Budiarto13
Estimated H-index: 13
Feature selection (FS) is one of the important tasks of data preprocessing in data analytics. The data with a large number of features will affect the computational complexity, increase a huge amount of resource usage and time consumption for data analytics. The objective of this study is to analyze relevant and significant features of huge network traffic to be used to improve the accuracy of traffic anomaly detection and to decrease its execution time. Information Gain is the most feature selection technique used in Intrusion Detection System (IDS) research. This study uses Information Gain, ranking and grouping the features according to the minimum weight values to select relevant and significant features, and then implements Random Forest (RF), Bayes Net (BN), Random Tree (RT), Naive Bayes (NB) and J48 classifier algorithms in experiments on CICIDS-2017 dataset. The experiment results show that the number of relevant and significant features yielded by Information Gain affects significantly the improvement of detection accuracy and execution time. Specifically, the Random Forest algorithm has the highest accuracy of 99.86% using the relevant selected features of 22, whereas the J48 classifier algorithm provides an accuracy of 99.87% using 52 relevant selected features with longer execution time.
Figures & Tables
đź“– Papers frequently viewed together
2019CCNC: Consumer Communications and Networking Conference
Cited By31
#1Hatitye Chindove (Rhodes University)
#2Dane Brown (Rhodes University)H-Index: 2
Network intrusion detection system (NIDS) adoption is essential for mitigating computer network attacks in various scenarios. However, the increasing complexity of computer networks and attacks make it challenging to classify network traffic. Machine learning (ML) techniques in a NIDS can be affected by different scenarios, and thus the recency, size and applicability of datasets are vital factors to consider when selecting and tuning a machine learning classifier. The proposed approach evaluate...
#1Sugandh Seth (G.N.D.U.: Guru Nanak Dev University)
#2Gurvinder Singh (G.N.D.U.: Guru Nanak Dev University)H-Index: 14
Last. Kuljit Kaur Chahal (G.N.D.U.: Guru Nanak Dev University)H-Index: 4
view all 0 authors...
The ever increasing sophistication of intrusion approaches has led to the dire necessity for developing Intrusion Detection Systems with optimal efficacy. However, existing Intrusion Detection Systems have been developed using outdated attack datasets, with more focus on prediction accuracy and less on prediction latency. The smart Intrusion Detection System framework evolution looks forward to designing and deploying security systems that use various parameters for analyzing current and dynamic...
Abstract Nowadays attacks on computer networks continue to advance at a rate outpacing cyber defenders’ ability to write new attack signatures. This paper illustrates a deep learning methodology for the binary classification of the network traffic. The basic idea is to represent network flows as 2D images and use this imagery representation of the network traffic to train a Generative Adversarial Network (GAN) and a Convolutional Neural Network (CNN). The GAN is trained to produce new images of ...
This research attempts to introduce the production methodology of an anomaly detection dataset using ten desirable requirements. Subsequently, the article presents the produced dataset named UGRansome, created with up-to-date and modern network traffic (netflow), which represents cyclostationary patterns of normal and abnormal classes of threatening behaviours. It was discovered that the timestamp of various network attacks is inferior to one minute and this feature pattern was used to record th...
Nowadays, it became difficult to ensure data security because of the rapid development of information technology according to the Vs of Big Data. To secure a network against malicious activities and to ensure data protection, an intrusion detection system played a very important role. The main objective was to obtain a high-performance solution capable of detecting different types of attacks around the system. The main aim of this paper is to study the lacks of traditional and open source Intrus...
#1Michael HeiglH-Index: 2
Last. Martin SchrammH-Index: 4
view all 5 authors...
Future-oriented networking infrastructures are characterized by highly dynamic Streaming Data (SD) whose volume, speed and number of dimensions increased significantly over the past couple of years, energized by trends such as Software-Defined Networking or Artificial Intelligence. As an essential core component of network security, Intrusion Detection Systems (IDS) help to uncover malicious activity. In particular, consecutively applied alert correlation methods can aid in mining attack pattern...
Abstract null null In the last decades, researchers, practitioners and companies struggled in devising mechanisms to detect malicious activities originating security threats. Amongst the many solutions, network intrusion detection emerged as one of the most popular to analyse network traffic and detect ongoing intrusions based on rules or by means of Machine Learners (MLs), which process such traffic and learn a model to suspect intrusions. Supervised MLs are very effective in detecting known th...
#1Adeel Abbas (QAU: Quaid-i-Azam University)H-Index: 1
#2Muazzam A. Khan (Pakistan Academy of Sciences)H-Index: 6
Last. Jawad Ahmad (Edinburgh Napier University)H-Index: 21
view all 6 authors...
The domain of Internet of Things (IoT) has witnessed immense adaptability over the last few years by drastically transforming human lives to automate their ordinary daily tasks. This is achieved by interconnecting heterogeneous physical devices with different functionalities. Consequently, the rate of cyber threats has also been raised with the expansion of IoT networks which puts data integrity and stability on stake. In order to secure data from misuse and unusual attempts, several intrusion d...
#3Wasim A. Al-Hamdani (University of the Cumberlands)H-Index: 6
Data analytics projects span all types of domains and applications. Researchers publish results using certain datasets and classification models. They present results with a summary of the performance metrics of their evaluated classifiers. However, readers and evaluators may not be able to compare results from the different papers for several reasons. One reason is the variations in the classification models and specific settings used in those models; a second reason is the variations of comput...
Last. Shigehiko
view all 8 authors...
The threatening Coronavirus which was assigned as the global pandemic concussed not only the public health but society, economy and every walks of life. Some measurements are taken to stifle the spread and one of the best ways is to carry out some precautions to prevent the contagion of SARS-CoV-2 virus to uninfected populaces. Injecting prevention vaccines is one of the precaution steps under the grandiose blueprint. Among all vaccines, it is found that mRNA vaccine which shows no side effect w...
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.