Accessibility navigation

Network intrusion detection system for detecting unknown network attacks using machine learning methods

Alzubi, S. M. Y. (2022) Network intrusion detection system for detecting unknown network attacks using machine learning methods. PhD thesis, University of Reading

[img] Text - Thesis
· Restricted to Repository staff only until 25 November 2024.

[img] Text - Thesis Deposit Form
· Restricted to Repository staff only


It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.48683/1926.00110386


Since the beginning of the internet age, the number of internet users has been rapidly increasing. Accordingly, the number of network attacks and their associated complexity is likewise rising. This increase in network attacks has triggered an alarm for governments and organisations, which have begun to invest millions in cybersecurity to mitigate the risk of cyberattacks. One effective, practical tool to defend against cyberattacks is the Intrusion Detection System (IDS) [1]. IDSs have been brought to the attention of researchers, who have begun incorporating Machine Learning (ML) methods into these systems. For this purpose, different IDSs using supervised and unsupervised ML methods have been proposed. An IDS based on supervised learning methods can detect known network attacks that the system has previously encountered and been trained on. However, they often fail to detect network attacks that are unfamiliar to the supervised model. Unsupervised learning methods can overcome this limitation and detect new, unfamiliar attack types that the system has never encountered. Nevertheless, unsupervised learning methods can produce many false positives [2], low precision and recall results. For this thesis, four research aims were developed and investigated. The first regards the possibility of developing a network IDS that offers high detection performance. The second aim considers the ability of the developed system to detect new network attacks introduced to the system. The third aim investigates the possibility of improving the overall results by implementing supervised ML models in the system. The fourth aim focuses on the feasibility of including explainable methods to help domain experts assess the threat level and understand the model’s decisions. To achieve these goals, this thesis presents a novel Network Intrusion Detection System framework that utilises the power of both unsupervised and supervised learning methods for network intrusion detection. The proposed framework consists of three components. The first component is a novel heterogeneous unsupervised bagging ensemble, called the Unknown Network Attack Detector (UNAD). A set of anomaly detection algorithms were evaluated for their potential utility as base learners for UNAD. Among these algorithms, the Local Outlier Factor (LOF) and Isolation Forest (iForest) algorithms were selected as UNAD’s base learners, as they produced the best results. Further, the weighted majority voting method is used as a results combiner for UNAD’s base learners. The second component of this framework is the supervised algorithm, trained on UNAD’s detected benign/no-rmal and attack flows, that improves the overall detection results. The Random Forest (RF) classifier was selected for this component because it produced the strongest results, as measured empirically. The third component in this framework is the explainable component, which explains the decision made by the model in a humanunderstandable way. Two types of explainability are implemented and illustrated in this thesis: local and global. For local explainability, Local Interpretable Model-agnostic Explanations (LIME) [3] was used, and for global explainability, the surrogate method based on the Decision Tree (DT) was used. The framework proposed in this thesis was evaluated using two publicly available datasets: CICIDS2017 [4] and NSL-KDD [5]. Empirical results revealed that UNAD—the first component—can detect completely new attack types with high detection rates for most attack types, and the RF classifier—the second component—can boost the detection rate for most attack types. The overall F1-scores for the CICIDS2017 and the NSL-KDD datasets were 98.31% and 98.25%, respectively. These experimental results showed that the explainable methods used in the system—the third component—can help domain experts assess threat levels and understand how the model made its decisions.

Item Type:Thesis (PhD)
Thesis Supervisor:Stahl, F.
Thesis/Report Department:School of Mathematical, Physical & Computational Sciences
Identification Number/DOI:
Divisions:Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:110386

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation