Simple Science

Cutting edge science explained simply

# Computer Science# Machine Learning# Cryptography and Security

Advancements in Detecting SQL Injection Attacks

Research shows machine learning significantly improves SQL injection detection rates.

― 5 min read


SQL Injection DetectionSQL Injection DetectionImprovedSQL injection attacks.Machine learning boosts defense against
Table of Contents

In this section, we discuss three experiments that we conducted to evaluate the effectiveness of our approach to detecting SQL injection attacks. We first looked into how well a basic model performed in detecting these attacks. The basic model used a simple method of combining rules based on fixed weights which did not consider normal traffic. As a result, it had a low Detection rate for the SQL injection attacks. The detection rate was only at a certain level even when conditions were best.

Next, we showed through testing that using Machine Learning techniques to adjust the weights of these rules significantly improved the detection rate, achieving an increase of up to 21%. Our third experiment involved a new method that trained the model specifically to handle attacks better, which improved its robustness by 42%. This method proved to be stronger than the basic model by 25%.

The overall setup for our experiments took place on an Ubuntu server with a powerful Intel processor and plenty of memory. We utilized a dataset that included many examples of both malicious and benign SQL Queries. This dataset was chosen because it is one of the most comprehensive resources available for training models to detect SQL injection attacks.

To create our training dataset, we randomly selected 20,000 samples from the original dataset. This training set contained an equal number of benign and malicious SQL queries. For testing, we created a separate dataset that included 4,000 randomly chosen samples. This testing set was designed to evaluate the performance of our detection models without overlapping with the training data.

We also developed a special training set specifically for Adversarial training. This involved creating 5,000 malicious SQL queries from our main training set and modifying them using advanced tools. These modified queries were then added back to the main training set. For testing, we created a separate adversarial test set by optimizing previously selected malicious queries while keeping benign queries unchanged. This setup allowed us to evaluate how well our methods could resist sophisticated attacks.

In our experiments, we implemented a feature extractor that used a specific library to work with the detection rules focusing on SQL Injections. The rules that we used for training the model were specifically designed to catch SQL injection attempts. We adjusted the detection system to minimize false positives while still allowing a maximum number of queries that provided the necessary data for training.

We also employed machine learning models based on support vector machines (SVM) and random forests (RF). These types of models are known for their effectiveness in classification tasks. We carefully tuned these models for performance by adjusting their parameters and validating them through a systematic approach. This method ensured that we obtained the best possible versions of these models for our task.

Our first goal was to assess the performance of the basic model. We explored its ability to differentiate between benign and malicious SQL queries across various settings. We focused on how well it could detect attacks at specific false positive rates. The results showed that the basic model struggled significantly, often misclassifying benign queries as malicious ones.

As we moved forward to the machine learning models, the results markedly improved. Both the SVM and RF models showed an enhanced ability to detect SQL injection attempts across various settings compared to the basic model. Particularly, both models demonstrated substantial increases in detection rates, often outpacing the basic approach.

When we tested the machine learning models against adversarial attacks, they still managed to outperform the basic model. Although they faced challenges under attack conditions, they proved to be more effective, showing better results in identifying malicious queries. In terms of overall robustness, both SVM and RF models maintained higher detection rates compared to the basic version.

Recognizing that models trained under challenging conditions could still be beneficial, we decided to retrain the machine learning models with a focus on adversarial robustness. This retraining provided a chance to evaluate how these models performed under both normal and attack conditions. The results revealed that these adversarially trained models not only held their ground against attacks but also maintained performance levels similar to their predecessors without such training.

When we looked closely at the features that the models used for detection, we found that the adversarial training helped spread the importance of various rules, making it harder for attackers to bypass detection. By observing which rules were activated during attacks, we gained insights into how both the basic and improved models handled different attack types.

Overall, our findings highlighted the importance of utilizing machine learning techniques to enhance the detection capabilities against SQL injection attacks. The basic model, while simple, proved to be inadequate in real-world scenarios. In contrast, the models that incorporated machine learning showed significant improvements, especially when facing sophisticated adversarial tactics.

In summary, through thoughtful experimentation and careful consideration of both benign and malicious samples, we illustrated how advanced techniques can greatly enhance the capability to detect SQL injection attacks. The results of our experiments indicate that training models specifically for adversarial conditions can yield stronger defenses and better handling of various attack types.

The developments in this research emphasize the necessity for ongoing improvement in detection systems and showcase the potential of machine learning approaches in achieving more reliable security measures against SQL injection threats. As attack methods continue to evolve, so too must our defensive strategies, making research in this area paramount for future security solutions.

These outcomes serve as a solid foundation for further studies focused on improving SQL injection detection and can be used to guide the design and implementation of more effective security measures in real-world applications. The importance of adapting to changing attack patterns through continuous refinement of detection systems cannot be overstressed, considering the ever-growing landscape of cyber threats that organizations face today.

Original Source

Title: ModSec-AdvLearn: Countering Adversarial SQL Injections with Robust Machine Learning

Abstract: Many Web Application Firewalls (WAFs) leverage the OWASP Core Rule Set (CRS) to block incoming malicious requests. The CRS consists of different sets of rules designed by domain experts to detect well-known web attack patterns. Both the set of rules to be used and the weights used to combine them are manually defined, yielding four different default configurations of the CRS. In this work, we focus on the detection of SQL injection (SQLi) attacks, and show that the manual configurations of the CRS typically yield a suboptimal trade-off between detection and false alarm rates. Furthermore, we show that these configurations are not robust to adversarial SQLi attacks, i.e., carefully-crafted attacks that iteratively refine the malicious SQLi payload by querying the target WAF to bypass detection. To overcome these limitations, we propose (i) using machine learning to automate the selection of the set of rules to be combined along with their weights, i.e., customizing the CRS configuration based on the monitored web services; and (ii) leveraging adversarial training to significantly improve its robustness to adversarial SQLi manipulations. Our experiments, conducted using the well-known open-source ModSecurity WAF equipped with the CRS rules, show that our approach, named ModSec-AdvLearn, can (i) increase the detection rate up to 30%, while retaining negligible false alarm rates and discarding up to 50% of the CRS rules; and (ii) improve robustness against adversarial SQLi attacks up to 85%, marking a significant stride toward designing more effective and robust WAFs. We release our open-source code at https://github.com/pralab/modsec-advlearn.

Authors: Biagio Montaruli, Giuseppe Floris, Christian Scano, Luca Demetrio, Andrea Valenza, Luca Compagna, Davide Ariu, Luca Piras, Davide Balzarotti, Battista Biggio

Last Update: 2024-11-29 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2308.04964

Source PDF: https://arxiv.org/pdf/2308.04964

Licence: https://creativecommons.org/licenses/by-sa/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles