
Abstract
Recently, cybersecurity attacks has become increasingly complex, with an increase
in automated attacks and vulnerabilities exploitations in web applications. Online
threats, such as bots or Cross-Site Scripting (XSS) attacks, represent new challenges
for data or user protection. Since the birth of the OWASP Top 10 in 2003, XSS
attacks have always remained firmly planted in their report. Starting from 2021,
XSS attacks have been included in a more general category called "Injection", which
includes other threats such as SQL injection. In the 2023 report, the category
dedicated to Injection is in third place. In addition, according to the Imperva 2023
report, 49.6% of Internet traffic is composed of bots. Of these, 32% are bad bots,
that perform automated tasks with malicious intent, such as extracting data from
websites without permission to reuse them and gain a competitive advantage.
Improvements in machine learning, particularly through unsupervised and su-
pervised learning techniques, have opened up new solutions for the detection and
prevention of these cyber threats. Past researches have identified machine learning
models for detecting bot-generated traffic and for detecting XSS attacks, already
demonstrating the potential of these tools. However, implementing these technolo-
gies requires a robust and flexible infrastructure, capable of handling large amounts
of data and providing adequate computing capacity.
The aim of the following thesis is therefore to implement an architecture on the
Amazon Web Services public cloud, to enable the use of machine learning models
for the detection of automated bots and XSS attacks. The use of cloud computing
offers several advantages, such as scalability, the availability of on-demand resources,
and the ability to integrate different services together. This architecture aims to
combine the strengths of unsupervised and supervised machine learning techniques
with the computational capabilities offered by cloud platforms, providing a scalable
solution for web application security.
In this thesis, a cloud architecture will be examined to implement a threat detec-
tion system based on machine learning, including the analysis of each architectural
component, the integration with other related cloud services, and the integration
with a proprietary tool for the defense of web applications. Furthermore, the
effectiveness of this architecture will be evaluated on real use cases, in terms of
model accuracy but also in terms of execution time.
The result of this research is an architecture developed entirely within the Ama-
zon AWS cloud, consisting of the Amazon MSK, Amazon ECS, Amazon SageMaker
and Elastic Cloud services, capable of obtaining predictions for bot detection, which
for detection of XSS attack attempts. With the resulting architecture, the average
latency time for bot detection is 40.56 seconds, obtained by analyzing about 13,000
sessions, and the average latency time for XSS attack attempt detection is 11.23
seconds, for about 1-2 suspicious requests.
ii