Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks PDF Free Download

1 / 21
0 views21 pages

Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks PDF Free Download

Near Real-Time Ethereum Fraud Detection Using Explainable AI in Blockchain Networks PDF free Download. Think more deeply and widely.

Academic Editor: George Drosatos
Received: 28 August 2025
Revised: 27 September 2025
Accepted: 7 October 2025
Published: 9 October 2025
Citation: Ertam, F. Near Real-Time
Ethereum Fraud Detection Using
Explainable AI in Blockchain
Networks. Appl. Sci. 2025,15, 10841.
https://doi.org/10.3390/
app151910841
Copyright: © 2025 by the author.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license
(https://creativecommons.org/
licenses/by/4.0/).
Article
Near Real-Time Ethereum Fraud Detection Using Explainable AI
in Blockchain Networks
Fatih Ertam
Department of Digital Forensics Engineering, Technology Faculty, Firat University, 23200 Elazı˘g, Türkiye;
fatih.ertam@firat.edu.tr
Abstract
Blockchain technologies have profoundly transformed information systems by provid-
ing decentralized infrastructures that enhance transparency, security, and traceability.
Ethereum, in particular, supports smart contracts and facilitates the development of de-
centralized finance (DeFi), non-fungible tokens (NFTs), and Web3 applications. However,
its openness also enables illicit activities, including fraud and money laundering, through
anonymous wallets. Identifying wallets involved in large transfers or abnormal transac-
tional patterns is therefore critical to ecosystem security. This study proposes an AI-based
framework employing XGBoost, LightGBM, and CatBoost to detect suspicious Ethereum
wallets, achieving test accuracies between 95.83% and 96.46%. The system provides near
real-time predictions for individual or recent wallet addresses using a pre-trained XGBoost
model. To improve interpretability, SHAP (SHapley Additive exPlanations) visualizations
are integrated, highlighting the contribution of each feature. The results demonstrate the
effectiveness of AI-driven methods in monitoring and securing Ethereum transactions
against fraudulent activities.
Keywords: blockchain; cryptocurrency forensics ; ethereum; explainable AI; fraud detection
1. Introduction
The advent of blockchain technology has precipitated a paradigm shift within nu-
merous industries, including finance, supply chain management, healthcare, and digital
identity systems [
1
,
2
]. This technology introduces a decentralised, transparent, and tamper-
resistant framework for the recording and verification of transactions [
3
]. In contradistinc-
tion to conventional centralized systems, in which a sole authority is responsible for the
maintenance and validation of records, blockchain relies on a distributed ledger that is
collectively maintained by a network of nodes. This decentralised consensus mechanism
has been demonstrated to mitigate the risk of single points of failure, whilst simultaneously
enhancing data integrity and auditability across untrusted environments [
4
]. Among the
diverse blockchain platforms that have been developed, Ethereum distinguishes itself as
a second-generation blockchain that extends beyond simple value transfer by offering a
Turing-complete programming environment for deploying smart contracts—self-executing
agreements encoded directly onto the blockchain [
5
]. The utilisation of smart contracts
facilitate the creation of decentralised applications (dApps), which are characterised by
their ability to function without the involvement of intermediaries and to execute complex
logic in a deterministic and trustless manner. Consequently, Ethereum has emerged as
the foundational infrastructure for a diverse range of decentralised finance (DeFi) proto-
cols, non-fungible token (NFT) ecosystems, and autonomous governance models. This
Appl. Sci. 2025,15, 10841 https://doi.org/10.3390/app151910841
Appl. Sci. 2025,15, 10841 2 of 21
positions it as a pivotal element in shaping the future of internet-based services [
6
,
7
]. The
introduction of the concept of a programmable blockchain by Ethereum resulted in a
substantial expansion of the functional scope of distributed ledger technology, thereby
enabling developers to construct decentralised applications (dApps) that exceed the limita-
tions of basic peer-to-peer financial transactions [
3
]. The integration of a Turing-complete
virtual machine, designated as the Ethereum Virtual Machine (EVM), enables the deploy-
ment of sophisticated smart contracts capable of executing conditional logic, managing
digital assets and automating multi-step workflows in a trustless and transparent environ-
ment [
8
]. This paradigm shift has laid the foundation for novel application domains such
as decentralised finance (DeFi), tokenised assets, supply chain automation, decentralised
autonomous organisations (DAOs), and identity management systems [
9
]. These domains
leverage Ethereum’s programmable infrastructure to eliminate intermediaries, reduce oper-
ational costs, and enhance system resilience [
10
]. The accelerated growth in the adoption
and market capitalisation of Ethereum has not only attracted legitimate innovation but
also drawn the attention of malicious actors seeking to exploit the platform for financial
gain. As Ethereum continues to serve as the foundational infrastructure for a vast array of
decentralised applications, it has become increasingly susceptible to diverse forms of fraud
and abuse. Within the context of the Ethereum ecosystem, the prevalence of fraudulent
activities encompasses phishing attacks, Ponzi schemes, transaction manipulation, and
the deployment of counterfeit decentralised applications (dApps). In the context of the
blockchain ecosystem, phishing has been identified as the most prevalent and damaging
form of attack, accounting for approximately 50% of all malicious incidents [
11
]. These
fraudulent schemes typically employ deceptive tactics, such as the creation of fake websites,
emails, or messaging platforms, with the aim of deceiving users into disclosing private
credentials, including seed phrases or private keys. This ultimately results in unautho-
rised access to their digital wallets and the theft of crypto-assets. Ethereum, a prominent
cryptocurrency, was originally developed by Vitalik Buterin. Ethereum functions as a
decentralised digital asset transfer system, allowing individuals to send cryptocurrency to
others for a minimal transaction fee. Irrespective of geographical location or background,
Ethereum ensures secure, consistent, and cost-effective participation in digital transactions
on a global scale. Ethereum’s decentralised architecture and the anonymity it affords have
led to its emergence as a highly effective medium for cryptocurrency transactions. This
has, in turn, rendered it an attractive tool for criminal networks seeking to conduct money
laundering and other illicit financial activities [
12
,
13
]. The Ethereum ecosystem has recently
experienced a notable surge in fraudulent activities, driven by the increasing sophistication
of cybercriminal tactics and the integration of advanced technologies. As stated in the
2025 Crypto Crime Report by Chainalysis, the estimated value of illicit cryptocurrency
transactions in 2024 was USD 40.9 billion. This figure is predicted to exceed USD 51 billion
as more illicit addresses are identified [
14
]. Furthermore, the Ethereum network has been
subject to advanced phishing techniques, including payload-based transaction phishing
(PTXPhish). This method involves the manipulation of smart contract interactions through
the use of malicious payloads, with the objective of deceiving users. A thoroughgoing
investigation has revealed more than 130,000 PTXPhish transactions on the Ethereum
blockchain, resulting in financial losses in excess of USD 341.9 million [
15
]. These de-
velopments highlight the pressing need for effective detection and mitigation strategies
within the Ethereum ecosystem. The implementation of advanced security measures, user
education, and continuous monitoring are of critical importance in ensuring the protection
of users and maintaining trust in decentralised platforms.
Appl. Sci. 2025,15, 10841 3 of 21
The primary contributions of this study are outlined as follows:
An artificial intelligence model was developed using labeled data from publicly avail-
able blockchain datasets. This model extracts behavioral and transactional features of
individual wallet addresses in near real time, and subsequently to classify them as either
suspicious or benign based on patterns of fraudulent activity that have been learned.
A near real-time monitoring framework was implemented for the identification and
analysis of recently active wallet addresses. The system is designed to ingest on-chain
transaction data in a continuous manner, with the capacity to detect newly active
wallets. Utilizing a trained model, it is then able to evaluate the likelihood of these
wallets being involved in illicit activities.
In order to enhance the interpretability of the model and thus support trust in auto-
mated decision-making processes, explainable artificial intelligence (XAI) techniques
were incorporated. These techniques facilitate the attribution of model predictions
to specific features or behaviors, thereby providing transparency into the rationale
behind the classification of a wallet as suspicious.
The remainder of this paper is organized as follows. Section 2reviews recent studies
on ethereum based fraud detection, highlighting methodological advances and existing
research gaps. Section 3describes the materials and methods employed in this study,
including dataset construction, feature engineering, and model development procedures.
Section 4presents and discusses the experimental results, emphasizing model performance,
feature relevance, and comparative analyses. Section 5outlines the main limitations of
the current study, providing context for result interpretation. Section 6discusses poten-
tial directions for extending this research, such as multiple blockchain networks deploy-
ment and integration with blockchain monitoring systems. Finally, Section 7concludes
the paper by summarizing the key findings and their implications for future blockchain
security research.
2. Related Works
Numerous studies have been conducted to detect fraudulent activities within the
Ethereum network, employing a variety of machine learning algorithms and classification
methods. A selection of significant contributions is outlined below.
2.1. Classical ML Approaches
Aziz et al. [
16
] investigated Ethereum fraud detection using various machine learning
techniques, including RF, MLP, and ensemble methods, on a dataset with limited attributes.
LGBM outperformed other models, achieving 98.60% accuracy, which improved to 99.03%
after hyperparameter tuning. Results were also compared with other boosting algorithms
such as XGBoost.
Steven et al. [
17
] focused on identifying malicious accounts involved in Ethereum trans-
actions. They utilized the XGBoost algorithm and evaluated its performance using tenfold
cross-validation. The model achieved a classification accuracy of 96.3%, and the study high-
lighted the three most influential features contributing to the model’s decision-making process.
Ravindranath et al. [
18
] evaluated ensemble learning models for detecting fraud in
the Ethereum network. CATBoost and LightGBM showed strong performance, achieving
97–98.42% accuracy with oversampling. High F1 and AUC scores indicated reliable detec-
tion without overfitting. Among the tested methods, K-Means SMOTE yielded the best
results, with 98.42% accuracy and a 99.82% AUC. These findings highlight the effectiveness
of ensemble models and advanced resampling in crypto fraud detection.
Dahiya et al. [
19
] proposed a neural network-based model for the detection of fraudu-
lent transactions on the Ethereum blockchain. The performance of the model was bench-
Appl. Sci. 2025,15, 10841 4 of 21
marked against several traditional machine learning classifiers, including Logistic Regres-
sion, Support Vector Machine (SVM), Gaussian Naive Bayes, and K-Nearest Neighbours.
Among all models that were evaluated, the neural network demonstrated the highest level
of accuracy, achieving 97.09%. This result indicates that the neural network possesses
a superior capacity to capture and learn complex data patterns. The findings empha-
sise the efficacy of neural networks in differentiating between authentic and fraudulent
Ethereum transactions.
2.2. Self-Supervised and Deep Learning Methods
Teng et al. [
20
] proposed a novel method for identifying anomalous smart contracts on
the Ethereum platform. Their approach involves extracting transaction patterns through a data
slicing technique, followed by training a detection model using LSTM networks. The results
demonstrated high precision in distinguishing anomalous contracts from legitimate ones.
Ehsan et al. [
21
] aimed to identify malicious actors and categorize attacks based on
behavior. They built a dataset from illicit Ethereum activities and applied feature selection
methods such as PCA, Information Gain, and Ridge Regression. Classification using Light-
GBM, XGBoost, and others showed that models with Information Gain and LGBM/XGBoost
reached 98% accuracy. XGBoost also completed analysis in 13.72 s. Additionally, the study
improved blockchain security by categorizing fraud types, enhancing network reliability.
Liu et al. [
22
] introduced S_HGTNs, a framework for detecting anomalies in Ethereum
smart contracts, focusing on financial fraud. It builds a Heterogeneous Information Net-
work (HIN) from contract features, learns a relational matrix via a transformer, and classifies
using node embeddings. Experiments show that the model outperforms traditional meth-
ods with higher accuracy and low variance, confirming its robustness and effectiveness.
2.3. Graph-Based Techniques
Tan et al. [
23
] proposed a fraud detection method on Ethereum by analyzing trans-
action records and using web crawlers to obtain labelled fraudulent addresses. These
were used to reconstruct a transaction network, from which features were extracted via an
amount-based network embedding. A Graph Convolutional Network (GCN) then classified
addresses as legitimate or fraudulent. The system achieved 95% accuracy, demonstrating
strong performance in identifying fraud.
Jin et al. [
24
] introduced Meta-IFD (Meta-Interaction-based Fraud Detection), an
Ethereum fraud detection framework based on meta-interaction concepts. It combines
generative and contrastive self-supervision to refine behavioral features and distinguish
activity types. Using multi-view feature learning, Meta-IFD captures rich behavioral
representations to detect fraud such as Ponzi schemes and phishing. Evaluations on
real Ethereum data show its robustness and high accuracy, with the generative module
addressing class imbalance and the contrastive module improving profile discrimination.
Tan et al. [
25
] proposed a framework for detecting fraudulent Ethereum transactions
through analysis of transaction records. Labelled addresses were collected using web
crawlers and used to build a transaction network from the public ledger. A network
embedding method was employed to extract node features, which were then classified by a
Graph Convolutional Network (GCN). The system achieved 96% accuracy, demonstrating
its effectiveness in fraud detection on the Ethereum blockchain.
Given the rapid growth of blockchain technology and cryptocurrencies, phishing scams
have emerged as a significant threat to transaction security. Existing detection methods
frequently fail to capture critical neighbor information and its impact on fraudulent behaviors.
In order to address these limitations, a phishing detection framework based on FAAN-GBM
(Feature and Attention Augmented Network with Gradient Boosting Machine) has been
Appl. Sci. 2025,15, 10841 5 of 21
proposed. This framework integrates basic, transaction, and interaction features of nodes
while leveraging attention mechanisms and autoencoders to enhance feature representation.
A recent experimental evaluation on authentic Ethereum datasets has demonstrated that
the FAAN-GBM model exhibits superior performance in comparison to existing approaches,
thereby significantly enhancing the accuracy of phishing fraud node detection [26].
The proliferation of smart contracts within the blockchain ecosystem has engendered
a heightened imperative for efficacious phishing detection mechanisms. Existing methods
frequently prove inadequate in capturing both global structural patterns in transaction
networks and local semantic relationships in transaction data. This limitation restricts
their capacity to detect complex phishing behaviors. To address these challenges, a dy-
namic feature fusion model has been proposed, combining graph-based representation
learning with semantic feature extraction. The model constructs global graph represen-
tations of account relationships and extracts local contextual features from transactions.
These features are then integrated via a dynamic multimodal fusion mechanism. A recent
experimental evaluation on large-scale real-world blockchain datasets has demonstrated
that this approach exhibits superior performance in terms of accuracy, F1 score, and recall
when compared to existing benchmarks. This finding underscores the importance of jointly
modeling structural and semantic information for effective phishing detection [27].
LMAE4Eth is a multi-view learning framework designed to improve Ethereum fraud
account detection by integrating transaction semantics, masked graph embeddings, and
expert knowledge. It utilises a transaction token comparative language model (TxCLM) to
convert numerical transactions into semantically meaningful representations and a masked
account graph autoencoder (MAGAE) focused on reconstructing account node features
for advanced node-level detection. Scalability is achieved through layer-wise sampling,
and features designed by experts are incorporated to improve model performance. Experi-
mental results demonstrate that LMAE4Eth outperforms 15 baseline methods, achieving
over 10% improvement in F1 score across two datasets and proving its effectiveness in
detecting fraudulent accounts [
28
]. However, the approaches require extensive sequence
pre-processing and lack the real-time deployment capabilities demonstrated in our work.
2.4. Hybrid Systems
Li et al. [
29
] addressed phishing detection on Ethereum as a graph classification
task and proposed PDGNN (Phishing Detection Graph Neural Network), an end-to-end
framework. It constructs a lightweight transaction network and extracts subgraphs linked
to known phishing accounts. Using a Chebyshev-GCN, the model classifies accounts as
phishing or legitimate. Experiments on five datasets show that PDGNN outperforms
traditional methods and scales well to large networks. Pahuja et al. [
30
] proposed a fraud
detection approach based on the CRISP-DM (Cross-Industry Standard Process for Data
Mining) framework for Ethereum transactions. Their method tackled data imbalance using
resampling, applied correlation-based feature selection, and used ensemble learning to
enhance accuracy. A comparison of ten classifiers showed ensemble models outperformed
single ones, with LightGBM achieving the highest accuracy at 99.2%, surpassing other
approaches on the same dataset.
2.5. Background on Ethereum and Fraud Typologies
Ethereum is a decentralised blockchain system that facilitates programmable transac-
tions through smart contracts. In the context of the Ethereum blockchain, wallet addresses
can be classified as either externally owned or contract-based. Common fraudulent be-
haviors include phishing, contract abuse, and address laundering, often observable via
abnormal transaction frequency, unusually high or low gas usage, or multiple interactions
Appl. Sci. 2025,15, 10841 6 of 21
with known blacklisted addresses. The comprehension of these behaviors was instrumental
in the subsequent feature engineering process, which is elaborated in the following section.
3. Materials and Method
The graphical representation of the proposed method of the study is presented in
Figure 1.
Phase 5: Near Real-Time Deployment
Input Address
(Single address or
Batch of addresses)
Feature Extraction
(< 50ms processing
time)
Model Prediction
(< 10ms inference time
Probability scores)
Final Output
(Classification: Normal/Suspicious, Confidence
score, SHAP explanations, Feature contributions)
Phase 4: Model Explainability (XAI)
SHAP Integration
(TreeExplainer for XGBoost
Feature contribution analysis)
Local Explanations
(Individual prediction, Waterfall plots
Feature attributions)
Global Explanations
(Feature importance, Summary plots
Dependence analysis)
Phase 3: Model Training & Optimization
Training Dataset
(9841 labeled samples
80% training, 20% testing)
Gradient Boosting Models
(XGBoost, LightGBM,
CatBoost)
Hyperparameter Tuning
(Grid Search CV, 5-fold cross-
validation, F1-score optimization) Best Model Selection
Phase 2: Feature Engineering & Processing
Raw Transaction Data
(Block number, hash, from, to,
value, gas, timestamp)
17 Behavioral Features
(Sent_tnx, Received_tnx, Unique addresses (from/to),
Value statistics (min/max/avg), Time differences,
Contract creations, Total Ether balance)
Data Preprocessing
(Min-Max normalization, Missing
value handling,Outlier detection)
Phase 1: Data Collection & API Integration
Ethereum Network
(Mainnet via Infura/MetaMask API)
Block Range Selection
(Last 1000 blocks or
10 most recent active addresses)
Transaction Extraction
(Real-time blockchain data, Web3 API
calls)
Figure 1. Proposed method.
The pipeline commences with Phase 1, in which blockchain data is retrieved from the
Ethereum mainnet via Web3 APIs. This phase involves the extraction of transactions from the
preceding 1000 blocks or the 10 most recent active addresses. Phase 2 involves the implemen-
tation of feature engineering, which entails the transformation of raw transaction data into a
set of 17 behavioral features. These features encompass metrics such as transaction counts,
value statistics, and temporal patterns. Subsequent to this, data preprocessing is conducted
through the utilization of Min–Max normalization. Phase 3 encompasses the training of
models using three gradient boosting algorithms (XGBoost, LightGBM, CatBoost) with com-
prehensive hyperparameter tuning through 5-fold cross-validation. Phase 4 integrates SHAP
for model interpretability, providing both local explanations for individual predictions and
global feature importance analysis. Phase 5 demonstrates the deployment pipeline, achieving
sub-50 millisecond feature extraction and sub-10 millisecond inference time, and outputting
classification results with confidence scores and SHAP-based explanations.
Appl. Sci. 2025,15, 10841 7 of 21
3.1. Feature Selection and Reference Dataset
In this study, the MetaMask API was employed to establish a secure connection to
the Ethereum network. The most recent 1000 blocks were analysed programmatically
through the API, enabling the extraction of 17 distinct features associated with a given
wallet address. These features capture various behavioural and transactional characteristics
of the wallet, including but not limited to transaction frequency, token interaction patterns,
and gas usage metrics. Table 1presents the extracted features and their definitions.
Table 1. Ethereum wallet transaction features.
Feature Name Description
Address Ethereum wallet address.
Sent_tnx
Total number of standard (non-contract) transactions sent from the
address.
Received_tnx
Total number of standard (non-contract) transactions received by the
address.
NumberofCreated_Contracts
Number of smart contract creation transactions initiated by the account.
UniqueReceivedFrom_Addresses Count of distinct sender addresses that sent Ether to this account.
UniqueSentTo_Addresses Count of distinct recipient addresses this account has sent Ether to.
MinValueReceived The smallest single Ether amount received in a transaction.
MaxValueReceived The largest single Ether amount received in a transaction.
AvgValueReceived Average Ether value received across all incoming transactions.
MinValSent The smallest single Ether amount sent in a transaction.
MaxValSent The largest single Ether amount sent in a transaction.
AvgValSent Average Ether value sent across all outgoing transactions.
TotalEtherSent Cumulative Ether sent from this address across all transactions.
TotalEtherReceived Cumulative Ether received by this address across all transactions.
TotalEtherBalance Net Ether balance after all incoming and outgoing transactions.
TotalTransactions
Total count of transactions including normal and contract creation
ones.
TimeDiffBetweenFirstandLast
Time duration in minutes between the first and the most recent
transaction.
AvgMinBetweenSentTnx Average time in minutes between two consecutive sent transactions.
A key challenge in this research is the limited availability of publicly accessible labelled
datasets that classify Ethereum addresses as either suspicious or normal. In order to address
this issue, the dataset employed in the study by Aziz et al. [
16
] served as the primary dataset.
The dataset under consideration contains 9841 entries corresponding to transactions on
the Ethereum network. Each entry is labelled to indicate whether the behavior is normal
(label 0) or suspicious (label 1). Specifically, 7662 records are marked as normal, while
the remaining instances are identified as suspicious. The original dataset encompasses
49 extracted features pertaining to transactional behavior, account activity and smart contract
interactions. For the purposes of this research, a subset of 17 features was selected from the
original 49. These features were determined to be both relevant and technically extractable in
real time. This refined feature set was then used to construct a new dataset, the parameters
of which were tailored to the requirements of the proposed detection system. The finalized
dataset for this study has been made publicly available via a GitHub repository [31].
Appl. Sci. 2025,15, 10841 8 of 21
3.2. Performance Metrics
In order to evaluate the classification performance of the dataset constructed for this
study, several widely accepted performance metrics were employed, including Accuracy,
Precision,Recall, and the F1-Score [
32
]. The metrics thus provide a comprehensive under-
standing of the model’s effectiveness in correctly identifying both suspicious and benign
wallet behaviors. The mathematical formulations corresponding to each metric are pre-
sented in Equations
(1)
(4)
.Accuracy is a metric of model precision, calculated as the ratio
of instances classified correctly to the total number of instances. Precision is defined as
the proportion of correctly predicted suspicious wallets among all wallets predicted as
suspicious, thereby reflecting the model’s ability to avoid false positives. Recall, also known
as sensitivity, is defined as the proportion of actual suspicious wallets that were correctly
identified by the model. This metric highlights the model’s capacity to minimize false nega-
tives. The F1-Score is the harmonic mean of precision and recall, offering a balanced metric
that is particularly useful when dealing with imbalanced datasets. The collective utilization
of these metrics ensures a robust evaluation of the model’s classification capabilities.
Accuracy =TP +TN
TP +TN +FP +FN (1)
Precision =TP
TP +FP (2)
Recall =TP
TP +FN (3)
F1-Score =2×Precision ×Recall
Precision +Recall (4)
3.3. Classification
In this study, several ensemble-based boosting algorithms were employed for the
purpose of classification, including LightGBM (Light Gradient Boosting Machine), XGBoost
(Extreme Gradient Boosting), and CatBoost [
33
]. The selection of these gradient boosting
frameworks was made on the basis of their proven efficiency, scalability, and high predictive
performance, particularly in the context of structured tabular data. Each of these algorithms
employs decision tree ensembles with optimized boosting strategies, thereby enabling the
model to capture complex patterns within the feature space and effectively distinguish
between suspicious and normal wallet behaviors. XGBoost is an advanced implementation
of gradient boosting machines that incorporates system optimization and algorithmic
enhancements to improve efficiency, scalability, and model performance [34].
To optimize the objective function, XGBoost applies a second-order Taylor expansion
to approximate the loss at iteration t. The approximated loss is given on Equation (5).
L(t)
n
i=1gift(xi) + 1
2hif2
t(xi)+(ft)(5)
where
ft(xi)
is the prediction from the newly added function (typically a regression tree) at
iteration
t
, and
(ft)
denotes the regularization term given in Equation
(6)
that controls
the complexity of the model.
(f) = γT+1
2λ
T
j=1
w2
j(6)
The terms
gi
and
hi
represent the first and second-order derivatives of the loss function
with respect to the prediction from the previous iteration
ˆ
y(t1)
i
, and are defined as follows:
Appl. Sci. 2025,15, 10841 9 of 21
gi=l(yi,ˆ
y(t1)
i)
ˆ
y(t1)
i
,hi=2l(yi,ˆ
y(t1)
i)
ˆ
y(t1)2
i
(7)
Here,
l(yi
,
ˆ
y(t1)
i)
denotes the loss function comparing the true label
yi
and the pre-
dicted value
ˆ
y(t1)
i
. The gradient
gi
captures the direction of steepest descent, while the
Hessian
hi
provides curvature information, allowing the algorithm to perform more accu-
rate and stable updates during optimization.
LightGBM is a gradient boosting framework based on decision tree algorithms, de-
signed to be distributed and efficient. It introduces techniques such as Gradient-based
One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) to reduce computation
and memory usage, making it suitable for large-scale and high-dimensional data [
35
]. Light-
GBM is a gradient boosting framework that uses histogram-based algorithms and grows
trees leaf-wise, optimizing computational efficiency. Loss function is given in Equation
(8)
.
L(t)=
iA
gift(xi) + 1
2hif2
t(xi)(8)
where A {1, . . . , n}is selected using GOSS (Gradient-based One-Side Sampling).
Regularized objective can be defined as given in Equation (9).
L=
n
i=1
l(yi,ˆ
yi) + λf2(9)
CatBoost is a gradient boosting algorithm specifically designed to handle categorical
features efficiently. It employs techniques such as ordered boosting and target statistics to
reduce overfitting and eliminate prediction shift, which commonly arise in the processing
of categorical variables [36].
At boosting iteration
t
, the prediction is updated as follows, as given in Equation
(10)
:
ˆ
y(t)
i=ˆ
y(t1)
i+ηft(xi)(10)
where
η
denotes the learning rate, and
ft
is the decision function (typically a decision tree)
added at iteration t.
To prevent target leakage and ensure unbiased gradient estimation, CatBoost intro-
duces the ordered gradient, defined in Equation (11).
g(t)
i=(yi,ˆ
y(t1)
i)
ˆ
y(t1)
i
without (xi,yi)
(11)
where the gradient for sample
i
is calculated excluding the sample itself from the statistics,
thus avoiding prediction shift.
The loss function minimized during training is expressed as shown in Equation (12).
L(t)=
n
i=1
(yi,ˆ
y(t)
i)(12)
where is the chosen loss function (e.g., log loss or squared error).
Appl. Sci. 2025,15, 10841 10 of 21
For the transformation of categorical features, CatBoost computes a smoothed target
statistic as presented in Equation (13).
TSj=i∈B(xij )yi+a·p
|B(xij)|+a(13)
where:
B(xij)is the set of prior samples with the same categorical value as xij,
pis the prior mean of the target,
ais a regularization (smoothing) parameter.
This approach enables CatBoost to achieve state-of-the-art performance, particularly
on datasets with high-cardinality categorical variables.
For the purposes of this study, the dataset was divided into a training set and a testing
set using an 80/20 split ratio, where 80% of the data was used for training the model and the
remaining 20% was reserved for performance evaluation. The classification results obtained
from the employed boosting algorithms were compared based on the metrics defined earlier.
Table 2presents a summary of the performance comparison across different classifiers.
Table 2shows that the boosting algorithms typically generate comparable outcomes,
with accuracy values ranging approximately from 95.83% to 96.46%, as evidenced by the
test results. The XGBoost-based model was selected for utilization in this study, and all
code implementations were written in Python 3.13.
Table 2. Performance comparison of different classifiers.
Metric XGBoost LightGBM CatBoost
Best Hyperparameters
colsample_bytree: 1.0
learning_rate: 0.2
max_depth: 5
n_estimators: 300
reg_alpha: 0
reg_lambda: 1.5
subsample: 1.0
bagging_fraction: 0.8
feature_fraction: 0.9
learning_rate: 0.2
max_depth: 5
n_estimators: 200
num_leaves: 31
depth: 7
iterations: 300
l2_leaf_reg: 3
learning_rate: 0.2
CV Accuracy Mean 0.9588 0.9646 0.9583
CV Accuracy Std 0.0090 0.0071 0.0066
CV F1 Mean 0.9585 0.9643 0.9580
CV F1 Std 0.0090 0.0071 0.0066
Test Accuracy 0.9589 0.9634 0.9584
Test Precision 0.9586 0.9633 0.9582
Test Recall 0.9589 0.9634 0.9584
Test F1 0.9581 0.9628 0.9575
Test ROC AUC 0.9882 0.9898 0.9880
Training Time (s) 833.96 471.86 220.65
Model Size (KB) 548.79 516.15 639.19
Latency (ms) 0.72 0.72 0.72
Table 3presents a near-real-time performance metrics and requrirements.
The performance metrics in the table demonstrate the model’s high efficiency and
effectiveness levels. The mean processing time for a single instance was measured at just
0.72 milliseconds (ms), well below the specified requirement of 100 milliseconds (ms). The
Appl. Sci. 2025,15, 10841 11 of 21
completion times for 95% and 99% of the transactions were recorded as 0.76 milliseconds
(ms) and 1.03 ms, respectively, thereby demonstrating the model’s efficacy, even in extreme
cases. The throughput per batch was 12,021 samples per second, which is well above the
predefined requirement, confirming the model’s high processing capacity. Additionally,
the batch processing time for a single example was found to be 0.08 milliseconds, thereby
substantiating the system’s aptitude for real-time applications.
An evaluation of the resource usage revealed a memory consumption of 410 MB and
a model size of 548.8 KB. These values are both well below the specified limits. These
findings demonstrate that the model is both lightweight and portable, operating efficiently
in terms of resources. A comprehensive evaluation of the performance metrics reveals
that the requirements are being met with considerably higher performance, thereby sub-
stantiating the model’s reliability in delivering both high processing speed and minimal
resource utilization.
Table 3. Near-real-time performance metrics and requirements.
Metric Value Requirement
Average Processing Time 0.72 ms <100 ms
P95 Response Time 0.76 ms <200 ms
P99 Response Time 1.03 ms <500 ms
Throughput (10 samples) 12,021.0 samples/s >50 samples/s
Time per Sample (batch) 0.08 ms <20 ms
Memory Usage 410.0 MB <1000 MB
Model Size 548.8 KB <1000 KB
3.4. Hyperparameter Optimization
Comprehensive grid search was performed across three gradient boosting algorithms.
The optimal XGBoost configuration achieved through 5-fold (for each of 2187 candidates,
totalling 10,935 fits) stratified cross-validation:
n_estimators: 300
max_depth: 5
learning_rate: 0.2
subsample: 1.0
colsample_bytree: 1.0
This systematic approach ensures reproducible results and addresses potential
overfitting concerns.
3.5. Ablation Study
In this study, ablation results were obtained by removing each feature separately. The
five most effective features are given in Table 4.
The findings of the present study suggest that the most critical features affecting the
model’s prediction performance are ‘Time Diff between first and last (Mins)’ and ‘Total-
Transactions’. The elimination of both features results in a model accuracy reduction of
approximately 1.06 %, suggesting a considerably more substantial impact compared to
other features. Upon the removal of the three additional features—MinValueReceived,
TotalEtherReceived, and MaxValueReceived—the accuracy loss remained at 0.26%, 0.26%,
and 0.21%, respectively. The findings indicate that behavioral/temporal characteristics, in-
cluding transaction timing and frequency, are more effective discriminators and informative
indicators for the model than financial value-based features.
Appl. Sci. 2025,15, 10841 12 of 21
Table 4. Ablation study.
Feature Baseline
Accuracy
Without
Feature
Accuracy
Accuracy
Drop
Relative
Accuracy
Impact
Baseline
F1
Without
Feature
F1
F1
Drop
Relative
F1
Impact
Time Diff between first and last (Mins) 0.9589 0.9487 0.0102 1.0593 0.9581 0.9474 0.0107 1.1209
TotalTransactions 0.9589 0.9487 0.0102 1.0593 0.9581 0.9477 0.0104 1.0895
MinValueReceived 0.9589 0.9563 0.0025 0.2648 0.9581 0.9556 0.0026 0.2677
TotalEtherReceived 0.9589 0.9563 0.0025 0.2648 0.9581 0.9554 0.0027 0.2851
MaxValueReceived 0.9589 0.9568 0.0020 0.2119 0.9581 0.9558 0.0023 0.2375
Sent_tnx 0.9589 0.9573 0.0015 0.1589 0.9581 0.9565 0.0017 0.1725
TotalEtherSent 0.9589 0.9573 0.0015 0.1589 0.9581 0.9565 0.0016 0.1682
UniqueReceivedFrom_Addresses 0.9589 0.9573 0.0015 0.1589 0.9581 0.9566 0.0015 0.1598
AvgValueReceived 0.9589 0.9573 0.0015 0.1589 0.9581 0.9564 0.0017 0.1768
TotalEtherBalance 0.9589 0.9573 0.0015 0.1589 0.9581 0.9565 0.0016 0.1682
Avg min between sent tnx 0.9589 0.9578 0.0010 0.1059 0.9581 0.9569 0.0012 0.1291
MaxValSent 0.9589 0.9578 0.0010 0.1059 0.9581 0.9571 0.0010 0.1079
UniqueSentTo_Addresses 0.9589 0.9578 0.0010 0.1059 0.9581 0.9570 0.0011 0.1163
Received_tnx 0.9589 0.9584 0.0005 0.0530 0.9581 0.9575 0.0006 0.0602
MinValSent 0.9589 0.9584 0.0005 0.0530 0.9581 0.9577 0.0005 0.0478
NumberofCreated_Contracts 0.9589 0.9589 0.0000 0.0000 0.9581 0.9580 0.0001 0.0082
AvgValSent 0.9589 0.9599 0.0010 0.1059 0.9581 0.9592 0.0011 0.1119
4. Results and Discussion
4.1. SHAP (SHapley Additive exPlanations)
To enhance the interpretability of the XGBoost model selected for this study, the SHAP
(SHapley Additive exPlanations) algorithm was employed. In the field of explainable
artificial intelligence (XAI), SHAP (SHapley Additive exPlanations) has emerged as one of
the most theoretically grounded and model-agnostic approaches for interpreting machine
learning models. It is based on cooperative game theory, particularly the concept of Shapley
values, which aim to fairly distribute the “payout” (in this case, the model output) among
the input features based on their marginal contributions. SHAP assigns each feature a value
that quantifies its individual contribution to a particular prediction. These contributions
are calculated by considering all possible permutations of feature subsets and computing
the average marginal effect of including a feature across these subsets. The result is a set of
additive feature attributions that sum to the model’s output for that instance. This makes
SHAP both local (interpreting individual predictions) and global (aggregating attributions
across many predictions) in scope [37].
In SHAP, the contribution of each feature
i
to the model’s prediction is calculated
using the following Shapley value formula:
ϕi=
SN\{i}
|S|!·(|N|−|S| 1)!
|N|![f(S {i})f(S)](14)
where
N
is the set of all input features,
SN\ {i}
is a subset of features excluding feature
i
,
f(S)
is the model prediction using only the features in subset
S
,
f(S {i})
is the prediction
after adding feature
i
,
ϕi
is the SHAP value representing the contribution of feature
i
to the
model output.
Appl. Sci. 2025,15, 10841 13 of 21
This formulation guarantees several desirable properties: local accuracy (the sum of
SHAP values equals the model prediction), missingness (features not in the model get zero
contribution), and consistency (if a model changes to increase the contribution of a feature,
its SHAP value will not decrease). As such, SHAP provides a principled and intuitive way
to interpret complex machine learning models. Algorithm 1illustrates the procedure for
generating the SHAP plot. Figure 2depicts the contribution values of each feature to the
classification outcome.
Figure 2. SHAP value.
For instance, in a specific prediction case, a wallet with high TotalEtherSent and
frequent outgoing transactions showed positive SHAP values, indicating strong association
with suspicious behavior. In contrast, wallets with low diversity in interacting addresses
showed negative SHAP values, correlating with benign activity. This insight is useful for
forensic analysts investigating suspicious wallet activity.
Table 5presents features according to SHAP importance.
The SHAP plot, in isolation, does not explicitly indicate the model’s prediction for a
given instance. Instead, it provides valuable insights into which features the model utilised
to make its prediction and the extent to which each feature influenced the outcome. The
SHAP importance plot is interpreted by examining the vertical axis, which lists the various
features employed in the model, arranged in accordance with their estimated impact on
the model’s output. The horizontal axis represents the SHAP values, which quantify the
extent to which a feature’s value for a particular sample deviates the model’s output from
its expected baseline. Positive SHAP values indicate that the feature in question exerts a
Appl. Sci. 2025,15, 10841 14 of 21
propelling influence on the prediction, thereby elevating the output (e.g., increasing the
probability of being considered suspicious). Conversely, negative SHAP values imply
a mitigating effect of the feature on the output. The colour of each point corresponds
to the actual value of the feature, with red representing high values and blue indicating
low values.
Table 5. Features and their mean absolute SHAP values.
Feature Mean Absolute SHAP Value
Time Diff between first and last (Mins) 1.6933
UniqueReceivedFrom_Addresses 1.4526
AvgValueReceived 1.1515
TotalTransactions 1.0833
Received_tnx 0.9854
Sent_tnx 0.9066
TotalEtherReceived 0.8005
TotalEtherSent 0.6614
MaxValueReceived 0.6227
MinValueReceived 0.6016
Avg min between sent tnx 0.5552
MinValSent 0.5397
AvgValSent 0.3124
UniqueSentTo_Addresses 0.2595
MaxValSent 0.2215
TotalEtherBalance 0.1875
NumberofCreated_Contracts 0.1364
SHAP values are used to quantitatively ascertain the extent to which features con-
tribute to model predictions. A thorough examination of the table reveals that features
with the highest average absolute SHAP values play a pivotal role in determining the
model’s output. Specifically, features such as “Time Diff between first and last (Mins)” and
“UniqueReceivedFrom_Addresses,” with values of 1.693 and 1.453, respectively, exert the
most significant influence on the model’s decision-making processes. Features such as
“AvgValueReceived”, “TotalTransactions”, and “Received_tnx”, which follow, also have
significant effects. Conversely, certain features, including “NumberofCreated_Contracts,”
“TotalEtherBalance,” and “MaxValSent,” exhibited lower SHAP values, suggesting that
their influence on model predictions is comparatively constrained relative to other features.
In summary, SHAP analysis offers a reliable indicator for explaining the model’s
decision-making mechanism and identifying which features are significant. This enhances
the model’s transparency and interpretability.
The integration of SHAP with XGBoost enhances the interpretability of tree ensemble
models by providing a decomposition of the model’s predictions into individual feature
contributions. For each prediction, expressed as
f(x) = T
t=1ft(x)
, SHAP computes the
contributions of each feature, denoted as
ϕi
, in a manner that satisfies the efficiency property,
iϕi=f(x)E[f(X)]
. This property ensures that the sum of the feature contributions
precisely accounts for the difference between the model’s prediction and its expected value,
Appl. Sci. 2025,15, 10841 15 of 21
thereby enabling a rigorous and quantitative assessment of how each feature influences the
model’s output.
Consequently, the SHAP framework provides a transparent and interpretable mechanism
for understanding the decision-making process of complex ensemble models such as XGBoost.
Algorithm 1 Fraud Detection in Blockchain Transactions Using XGBoost and SHAP
Input: Dataset file path
Output:
Trained XGBoost model, normalization scaler, accuracy score, SHAP summary
plot
1: Load dataset from CSV file
2: Separate input features Xand target variable y:
3: Remove Address column from X
4: Extract class column as target y
5: Normalize features Xusing Min-Max scaling to obtain Xscaled
6: Split data into training and test sets:
7: (Xtrain
,
Xtest
,
ytrain
,
ytest)train_test_split(Xscaled
,
y
,
test_size =
0.2, random_state =42)
8: Initialize XGBoost classifier with parameters:
9: use_label_encoder=False,eval_metric="logloss",tree_method="hist"
10: Train model Mon training data (Xtrain,ytrain)
11: Save trained model and scaler to disk
12: Predict target values ypred for Xtest using model M
13: Calculate accuracy score between ytest and ypred
14: Initialize SHAP explainer with model M
15: Compute SHAP values for test data Xtest
16: Generate and save SHAP summary plot
4.2. Feature Extraction for Given Wallet Address
One of the modules developed for the present study focuses on the extraction of
relevant features for any given Ethereum wallet address, and the subsequent prediction
of the wallet’s suspicion based on these extracted features. Algorithm 2illustrates the
step-by-step procedure used for feature extraction from the wallet.
4.3. Detection of the Last 10 Active Wallet Addresses and Extraction of the Properties of These
Wallets and Model Estimation
The algorithm developed to identify the last 10 active wallet addresses is presented
in Algorithm 3. This module is designed to identify the last 10 active Ethereum wallet
addresses and extract their corresponding features. Following feature extraction, the model
is used to predict whether each of these wallets exhibits suspicious behavior. This script
connects to the Ethereum API using Infura and iteratively checks each block from the latest
one down to the range limit. For every block, it collects unique “from” and “to” addresses
from all transactions. Once it identifies 10 unique active addresses, it stops and writes them
to a CSV file.
After identifying the last 10 active addresses, the characteristics of each of these
addresses were extracted as illustrated in Algorithm 4, and the resulting data were saved
to a CSV file.
The model was trained on the extracted features of the last 10 addresses, and predic-
tions regarding the normal or suspicion of each wallet address were made by applying the
XGBoost model. The algorithm developed for this module is presented in Algorithm 5.
Appl. Sci. 2025,15, 10841 16 of 21
Algorithm 2 Ethereum Account Feature Extraction
Input: Ethereum account address, start block, end block
Output: Transaction features for the account in a CSV file
1: Connect to Ethereum via Web3 provider
2: Initialize empty list transactions
3: Loop over blocks from start_block to end_block:
4: for each block number in the range do
5: Retrieve block with full transaction data
6: for each transaction in block do
7: if transaction is sent from or received by the given address then
8:
Extract transaction data: block number, hash, sender, recipient, value in ETH,
gas, timestamp
9: Append data to transactions
10: end if
11: end for
12: end for
13: Extract timestamps and compute:
14: Time difference between first and last transaction
15: Average time between sent transactions
16: Separate sent and received transactions
17: Compute the number of unique senders and recipients
18: Count the number of contract creations
19: Compute statistical values for sent and received ETH:
20: Min, Max, and Average values
21: Total ETH sent and received
22: Balance = Received - Sent
23: Build a feature dictionary with all computed metrics
24: Save the feature dictionary as a row in a CSV file
Algorithm 3 Extracting Latest Active Ethereum Addresses
Input: Ethereum node access, block range N, number of addresses k
Output: A CSV file with the kmost recent active Ethereum addresses
1: Connect to Ethereum using Web3
2: Get the latest block number
3: Set the scan range as the last Nblocks
4: Initialize an empty set active_addresses
5: for block number from latest to latest N(in reverse) do
6: Fetch the block with full transaction data
7: for each transaction in the block do
8: Add from and to addresses to active_addresses, if present
9: if size of active_addresses kthen
10: Return the list of active addresses
11: end if
12: end for
13: end for
14: Save the collected addresses into a CSV file
A comparison of our model with existing approaches, such as those proposed by
Aziz et al. [16]
and Ehsan et al. [
21
], reveals that our model not only achieves similar
or better accuracy, but also emphasizes real-time applicability and modular design. The
majority of extant research concentrates exclusively on offline datasets, whereas our pipeline
operates directly on live-chain data using the Web3 API. This architectural enhancement
ensures the system’s deployability for security monitoring platforms. Furthermore, the
explainability provided by SHAP not only fosters model transparency but also supports
compliance with regulatory requirements in the context of blockchain forensics.
Appl. Sci. 2025,15, 10841 17 of 21
Algorithm 4 Extract Ethereum Account Features
Input: Ethereum address list from CSV, start and end block numbers
Output: Extracted features for each address saved in output CSV
1: Connect to Ethereum mainnet via Web3
2: Load Ethereum addresses from latest_active_addresses.csv
3: Define output CSV eth_account_features.csv
4: Get current block as latest_block, set start_block = latest_block - 500
5: for each address in input CSV do
6: if address is valid then
7: Initialize empty transaction list
8: for each block from start_block to latest_block do
9: Get block with full transactions
10: for each transaction in block do
11: if transaction involves address then
12: Collect transaction info (value, gas, timestamp, etc.)
13: end if
14: end for
15: Wait 0.5 s to avoid rate limits
16: end for
17: Compute:
Number of sent/received transactions
Number of contracts created
Unique sent-to and received-from addresses
Min, max, avg sent/received values
Total Ether sent/received, balance
Time statistics
18: Write all features to output CSV
19: end if
20: end for
Algorithm 5 Ethereum Account Fraud Classification Pipeline
Input: CSV file path
Output: Prediction (Normal or Suspicion)
1: Load: Saved scaler from scaler.pkl
2: Load: Trained model from xgboost_fraud_detection_model.pkl
3: Load dataset as dataframe df from CSV
4: Store the Address column separately in addresses
5: Remove the Address column from df
6: for all columns in df do
7: Convert values to numeric (coerce invalid values as NaN)
8: end for
9: Fill missing values in df with column medians
10: Load scaler using joblib
11: Apply scaler transformation to df to obtain X_new_scaled
12: Load XGBoost model using joblib
13: Predict labels for X_new_scaled using the loaded model
14: Create a new dataframe output_df with:
15: Address from original data
16: Class as "NORMAL" if prediction is 0, else "SUSPICION"
17: Save output_df to eth_account_predictions.csv
5. Limitation
While the proposed system exhibits high performance in the realm of near real-time
Ethereum fraud detection, it is imperative to acknowledge certain limitations to provide a
balanced perspective on the scope and applicability of the work.
The paucity of high-quality, publicly labeled datasets for Ethereum fraud detection
poses a significant constraint. The present study is predicated on a dataset comprising
9.841 transactions, which, while substantial, may not encompass the full spectrum of
fraudulent behavior present in the current ecosystem.
Appl. Sci. 2025,15, 10841 18 of 21
It is imperative to acknowledge the perpetual adaptability exhibited by cybercriminals
in their endeavors to circumvent detection systems. This perpetual adaptability
renders training data as a representation of evolving fraud patterns as inadequate.
This temporal bias has the potential to compromise the model’s efficacy in addressing
novel attack vectors.
The binary classification approach (normal and suspicious) may be an oversimplifi-
cation of the complex nature of blockchain activity. It is important to note that some
transactions may fall into a gray area that is neither clearly fraudulent nor entirely
legitimate. The current model does not differentiate between different fraudulent
activities (e.g., phishing, Ponzi schemes, and money laundering).
Despite the fact that our set of 17 features captures fundamental behavioral patterns,
it is possible that these features do not encompass all relevant fraud indicators. It is
possible that more sophisticated fraud schemes may employ patterns not yet captured
in our current feature set. The prevailing feature engineering approach is static, which
may hinder its ability to adapt to the evolving nature of fraud.
At present, the system’s design is exclusively tailored for Ethereum, thereby limiting its
generalizability to disparate blockchain networks characterized by varied transaction
structures and consensus mechanisms.
The system’s reliance on external APIs (Ethereum nodes, Etherscan) introduces potential
points of failure and rate-limiting constraints that could impact real-time performance.
Despite the system’s optimization for efficiency, concurrent processing of voluminous
address groups can necessitate substantial memory resources, a factor that could im-
pede the system’s applicability in environments characterized by resource constraints.
In the context of sophisticated attacks, entities that possess an understanding of the
model’s decision boundaries may devise specially designed transactions with the
intent to evade detection. It should be noted that this particular scenario has not been
the focus of a comprehensive evaluation within the scope of our current research.
While SHAP offers interpretability, it concomitantly unveils the model’s decision-
making process, leaving it vulnerable to exploitation by potential attackers.
While the current work demonstrates significant advances in near real-time Ethereum
fraud detection with high interpretability, the identified limitations provide clear directions
for future research and development. The proposed future work encompasses both techni-
cal enhancements and broader considerations of practical deployment, ethical implications,
and societal impact. Addressing these limitations and pursuing the outlined research
directions will contribute to the development of more robust, scalable, and trustworthy
blockchain security systems. The rapid evolution of blockchain technology and fraud
techniques necessitates continuous research and adaptation. The framework under consid-
eration provides a solid foundation that can be extended and enhanced to meet emerging
challenges in blockchain security and fraud detection.
6. Future Works
In the future, the robustness of models may be enhanced through the incorporation
of token transaction graphs and smart contract call traces. In addition, the validation of
the pipeline is planned to be conducted on real-time unlabeled data, employing human-in-
the-loop verification. The development of a unified framework capable of operating across
multiple blockchain networks (Ethereum, Binance Smart Chain, Polygon, etc.) would
significantly increase the system’s utility and market viability. The extension of the binary
classification to identify specific types of fraud (e.g., phishing, pyramid schemes, mixing
services) will provide researchers with more actionable intelligence. The incorporation of
graph-based features, while preserving the interpretability and efficiency of the prevailing
Appl. Sci. 2025,15, 10841 19 of 21
approach through hybrid architectures, is expected to enhance the system’s effectiveness.
Beyond rudimentary statistical measurements, more sophisticated temporal modeling
techniques can be employed to identify patterns in fraud behavior over time. The develop-
ment of specific models for various types of fraud, and the subsequent integration of these
models to create hybrid systems, is a promising area of research.
7. Conclusions
This study presents a machine learning approach for detecting fraudulent Ethereum
wallet addresses. The system has been demonstrated to demonstrate the capability of
evaluating individual or recently active wallet addresses in near real-time by leveraging
a pre-trained XGBoost model. This has been shown to result in a high accuracy rate
of 96% in classifying suspicious behavior. Incorporating SHAP values into the model
helps improve its interpretability and transparency, thus providing information on the
contribution of each feature to the final decision. The findings of this study suggest
that explainable artificial intelligence (XAI) techniques have the potential to substantially
improve the trustworthiness and usability of blockchain analytics tools. In the future,
several enhancements are planned. Firstly, the model will be extended to support multi-
chain analysis, incorporating data from other popular blockchains such as Binance Smart
Chain or Polygon. Furthermore, the feature set may be expanded to include more granular
behavioral indicators derived from smart contract interactions and token transfers. Real-
time streaming data integration is also a key development goal, which would allow the
system to function continuously on live blockchain activity. Finally, the deployment of the
solution as a publicly accessible API or dashboard has the potential to facilitate broader use
in security monitoring, compliance, and financial auditing applications.
Funding: This work is supported by Scientific Research Projects Coordination Unit of Firat University,
Türkiye, Project Numbers: TEKF.25.13 and ADEP.25.28.
Data Availability Statement: Data are available from the corresponding author upon reasonable request.
Conflicts of Interest: The author declares no conflicts of interest.
References
1. Xu, M.; Chen, X.; Kou, G. A systematic review of blockchain. Financ. Innov. 2019,5, 27. [CrossRef]
2.
Ressi, D.; Romanello, R.; Piazza, C.; Rossi, S. AI-enhanced blockchain technology: A review of advancements and opportunities.
J. Netw. Comput. Appl. 2024,225, 103858. [CrossRef]
3.
Sun, J.; Jia, Y.; Wang, Y.; Tian, Y.; Zhang, S. Ethereum fraud detection via joint transaction language model and graph representation
learning. Inf. Fusion 2025,120, 103074. [CrossRef]
4.
Gad, A.G.; Mosa, D.T.; Abualigah, L.; Abohany, A.A. Emerging trends in blockchain technology and applications: A review and
outlook. J. King Saud Univ.-Comput. Inf. Sci. 2022,34, 6719–6742. [CrossRef]
5.
Zheng, Z.; Su, J.; Chen, J.; Lo, D.; Zhong, Z.; Ye, M. Dappscan: Building large-scale datasets for smart contract weaknesses in
dapp projects. IEEE Trans. Softw. Eng. 2024,50, 1360–1373. [CrossRef]
6.
Han, H.; Shiwakoti, R.K.; Jarvis, R.; Mordi, C.; Botchie, D. Accounting and auditing with blockchain technology and artificial
Intelligence: A literature review. Int. J. Account. Inf. Syst. 2023,48, 100598. [CrossRef]
7.
Tripathi, G.; Ahad, M.A.; Casalino, G. A comprehensive review of blockchain technology: Underlying principles and historical
background with future challenges. Decis. Anal. J. 2023,9, 100344. [CrossRef]
8.
Ma, F.; Ren, M.; Fu, Y.; Wang, M.; Li, H.; Song, H.; Jiang, Y. Security reinforcement for Ethereum virtual machine. Inf. Process.
Manag. 2021,58, 102565. [CrossRef]
9.
Wu, S.; Yu, Z.; Wang, D.; Zhou, Y.; Wu, L.; Wang, H.; Yuan, X. Defiranger: Detecting DeFI price manipulation attacks. IEEE Trans.
Dependable Secur. Comput. 2023,21, 4147–4161. [CrossRef]
10.
Faqir-Rhazoui, Y.; Arroyo, J.; Hassan, S. A comparative analysis of the platforms for decentralized autonomous organizations in
the Ethereum blockchain. J. Internet Serv. Appl. 2021,12, 9. [CrossRef]
Appl. Sci. 2025,15, 10841 20 of 21
11.
Li, S.; Gou, G.; Liu, C.; Xiong, G.; Li, Z.; Xiao, J.; Xing, X. TGC: Transaction Graph Contrast Network for Ethereum Phishing Scam
Detection. In Proceedings of the 39th Annual Computer Security Applications Conference, Austin, TX, USA, 4–8 December 2023;
pp. 352–365.
12.
Wu, J.; Lin, D.; Fu, Q.; Yang, S.; Chen, T.; Zheng, Z.; Song, B. Toward understanding asset flows in crypto money laundering
through the lenses of Ethereum heists. IEEE Trans. Inf. Forensics Secur. 2023,19, 1994–2009. [CrossRef]
13.
Wronka, C. Money laundering through cryptocurrencies-analysis of the phenomenon and appropriate prevention measures.
J. Money Laund. Control 2022,25, 79–94. [CrossRef]
14.
Chainalysis, T. The Chainalysis 2025 Crypto Crime Report. 2025. Available online: https://go.chainalysis.com/2025-Crypto-
Crime-Report.html (accessed on 19 May 2025).
15.
Chen, Z.; Hu, Y.; He, B.; Luo, D.; Wu, L.; Zhou, Y. Dissecting payload-based transaction phishing on Ethereum. arXiv 2024,
arXiv:2409.02386. [CrossRef]
16.
Aziz, R.M.; Baluch, M.F.; Patel, S.; Ganie, A.H. LGBM: A machine learning approach for Ethereum fraud detection. Int. J. Inf.
Technol. 2022,14, 3321–3331. [CrossRef]
17.
Farrugia, S.; Ellul, J.; Azzopardi, G. Detection of illicit accounts over the Ethereum blockchain. Expert Syst. Appl. 2020,150, 113318.
[CrossRef]
18.
Ravindranath, V.; Nallakaruppan, M.; Shri, M.L.; Balusamy, B.; Bhattacharyya, S. Evaluation of performance enhancement in
Ethereum fraud detection using oversampling techniques. Appl. Soft Comput. 2024,161, 111698. [CrossRef]
19.
Dahiya, M.; Mishra, N.; Singh, R. Neural network based approach for Ethereum fraud detection. In Proceedings of the 2023 4th
International Conference on Intelligent Engineering and Management (ICIEM), London, UK, 9–11 May 2023; 2023; pp. 1–4.
20.
Hu, T.; Liu, X.; Chen, T.; Zhang, X.; Huang, X.; Niu, W.; Lu, J.; Zhou, K.; Liu, Y. Transaction-based classification and detection
approach for Ethereum smart contract. Inf. Process. Manag. 2021,58, 102462. [CrossRef]
21.
Ehsan, A.; Iqbal, Z.; Abuowaida, S.; Aljaidi, M.; Zia, H.U.; Alshdaifat, N.; Alshammry, N.K. Enhanced Anomaly Detection in
Ethereum: Unveiling and Classifying Threats with Machine Learning. IEEE Access 2024,12, 176440–176456. [CrossRef]
22.
Liu, L.; Tsai, W.T.; Bhuiyan, M.Z.A.; Peng, H.; Liu, M. Blockchain-enabled fraud discovery through abnormal smart contract
detection on Ethereum. Future Gener. Comput. Syst. 2022,128, 158–166. [CrossRef]
23.
Tan, R.; Tan, Q.; Zhang, P.; Li, Z. Graph neural network for ethereum fraud detection. In Proceedings of the 2021 IEEE
international conference on big knowledge (ICBK), Auckland, New Zealand, 7–8 December 2021; pp. 78–85.
24.
Jin, C.; Zhou, J.; Xie, C.; Yu, S.; Xuan, Q.; Yang, X. Enhancing Ethereum Fraud Detection via Generative and Contrastive
Self-supervision. IEEE Trans. Inf. Forensics Secur. 2024,20, 839–853. [CrossRef]
25.
Tan, R.; Tan, Q.; Zhang, Q.; Zhang, P.; Xie, Y.; Li, Z. Ethereum fraud behavior detection based on graph neural networks.
Computing 2023,105, 2143–2170. [CrossRef]
26.
Liu, S.Z.; Yu, X.Y.; Li, Y.T.; Zhang, H.; Guo, X.P.; Ma, C.H.; Long, H.X. Detection of Ethereum Phishing Fraud Nodes Based on
Feature Enhancement Strategy and GBM. Electronics 2024,13, 5060. [CrossRef]
27.
Sheng, Z.; Song, L.; Wang, Y. Dynamic Feature Fusion: Combining Global Graph Structures and Local Semantics for Blockchain
Phishing Detection. IEEE Trans. Netw. Serv. Manag. 2025,22, 4706–4718. [CrossRef]
28.
Jia, Y.; Wang, Y.; Sun, J.; Tian, Y.; Qian, P. LMAE4Eth: Generalizable and Robust Ethereum Fraud Detection by Exploring
Transaction Semantics and Masked Graph Embedding. IEEE Trans. Inf. Forensics Secur. 2025,20, 10260–10274. [CrossRef]
29.
Li, P.; Xie, Y.; Xu, X.; Zhou, J.; Xuan, Q. Phishing fraud detection on ethereum using graph neural network. In Proceedings of the
International Conference on Blockchain and Trustworthy Systems, Chengdu, China, 4–5 August 2022; Springer: Singapore, 2022;
pp. 362–375.
30.
Pahuja, L.; Kamal, A. EnLEFD-DM: Ensemble Learning Based Ethereum Fraud Detection Using CRISP-DM Framework. Expert
Syst. 2023,40, e13379. [CrossRef]
31.
Github. Github Repository Dataset. 2025. Available online: https://github.com/fatihertam/ethereumfrauddetection (accessed
on 19 May 2025).
32.
Kilincer, I.F. Explainable AI supported hybrid deep learnig method for layer 2 intrusion detection. Egypt. Inform. J. 2025,
30, 100669. [CrossRef]
33.
Ahn, J.M.; Kim, J.; Kim, K. Ensemble machine learning of gradient boosting (XGBoost, LightGBM, CatBoost) and attention-based
CNN-LSTM for harmful algal blooms forecasting. Toxins 2023,15, 608. [CrossRef]
34.
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference
on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794.
35.
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision
tree. Adv. Neural Inf. Process. Syst. 2017,30, 3149–3157.
Appl. Sci. 2025,15, 10841 21 of 21
36.
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv.
Neural Inf. Process. Syst. 2018,31, 6639–6649.
37.
Li, Z. Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and
XGBoost. Comput. Environ. Urban Syst. 2022,96, 101845. [CrossRef]
Disclaimer/Publishers Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.