Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF Free Download

1 / 19
3 views19 pages

Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF Free Download

Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF free Download. Think more deeply and widely.

This paper is included in the Proceedings of the
34th USENIX Security Symposium.
August 13–15, 2025 • Seattle, WA, USA
978-1-939133-52-6
Open access to the Proceedings of the
34th USENIX Security Symposium is sponsored by USENIX.
Ghost Clusters: Evaluating Attribution of
Illicit Services through Cryptocurrency Tracing
Kelvin Lubbertsen, Michel van Eeten, and
Rolf van Wegberg, Delft University of Technology
https://www.usenix.org/conference/usenixsecurity25/presentation/lubbertsen
Ghost Clusters: Evaluating Attribution of Illicit Services
through Cryptocurrency Tracing
Kelvin Lubbertsen, Michel van Eeten, and Rolf van Wegberg
Delft University of Technology
Abstract
One of the principles in cryptocurrency tracing is putting a
name to an address a process called attribution. Attribution
is key for both law enforcement and compliance professionals.
Blockchain intelligence companies sell attribution as a service
by leveraging pseudonymous blockchains, clustering heuris-
tics, and labeling of addresses. In this paper, we perform a
case study on Chainalysis, the market leader, and evaluate its
attribution by comparing it against ground-truth data on three
seized illicit services BestMixer, Hansa Market, and Wall
Street Market. To design the evaluation, we interview front-
line law enforcement professionals and learn how they trace
cryptocurrencies using blockchain intelligence providers. We
identify three evaluation techniques i.e., address overlap,
money flows, and address roles that realistically measure at-
tribution in line with law enforcement use cases. Using these
techniques, we show that for our three illicit services, Chainal-
ysis provides a reliable lower bound (
24.54
to
94.85
percent
accurate), and produces very few false positives (less than
0.5
percent). Also, we find that coverage changes over time.
We reason about factors that influence attribution and demon-
strate the importance of attributing certain key addresses to
achieve high coverage, and with that, show that when includ-
ing a second blockchain intelligence provider, the difficulties
in generalizing results.
1 Introduction
Follow-the-money has long been the cornerstone of many
law enforcement investigations. With the adoption of cryp-
tocurrencies in cybercrimes from ransomware payments
to buying drugs on so-called ‘darknet markets’ and paying
for bulletproof hosting efforts to police these crimes have
become reliant on the ability to trace cryptocurrency trans-
actions. Although most cryptocurrencies are pseudonymous
which means transactions and identifiers are transparently
stored in blockchains it remains challenging to link real-
world identities to wallets, addresses, and services. This pro-
cess of linking blockchain activities to real-world identities
is called ‘attribution’ [26]. Attribution is a critical step in
tracing illicit money flows, as done by law enforcement in-
vestigators, private investigators, and compliance teams at
regulated crypto-asset service providers.
Almost a decade after the first attempts to deanonymize
cryptocurrency money flows [39], an industry of commercial
data providers has emerged specializing in the attribution of
services, wallets, and addresses on blockchains. A few big
commercial players in this field are Chainalysis, Elliptic, and
TRM Labs where Chainalysis is the market leader. All these
companies provide data and tools that allow law enforce-
ment, private investigators, and compliance teams to trace
illicit money flows. The dependence on such ‘attribution-as-
a-service’ providers means the service must be reliable [33].
Law enforcement conducts criminal investigations based on
the attribution of addresses. Major cases, such as the takedown
of the then largest darknet market AlphaBay [34], depended
heavily on using these methods. Also, in the private sector,
virtual asset services such as cryptocurrency exchanges use
tools that rely heavily on attribution. Regulation to prevent
money laundering and terrorist financing requires some form
of customer screening, including the source of funds. On-
chain transaction monitoring is a vital component, and the
same kind of attribution data is used for it. They are used to
file suspicious activity reports at exchanges, and failing to
have this in place can have serious consequences, as can be
seen with, for instance, Binance, which was fined by the US
authorities in 2023 [35].
However, illicit services are likely to resist attribution by
their nature: they, by definition, do not cooperate with regu-
lators and sometimes even advertise their evasion of AML
(anti-money laundering) policies enforced by tools such as
Chainalysis as Bestmixer did. This makes validation dif-
ficult if not impossible. Commercial intelligence providers
cannot access large-scale ground-truth data of illicit services,
wallets, and addresses to validate their methods. Hence, they
often rely on direct interactions with the service e.g., initiat-
ing a transaction with a Bitcoin mixer to generate a sample
of illicit service addresses. These are then expanded upon via
USENIX Association 34th USENIX Security Symposium 1357
algorithms that make use of state-of-the-art heuristics, like
co-spending [26]. Still, precision is critical when the stakes
are as high as they are in, for instance, law enforcement. Inac-
curate labeling of a service as illicit could derail a prosecution
or, in the worst case, even lead to a wrongful conviction.
Obtaining ground truth data from illicit services, would
help in assessing attribution accuracy. Yet, acquiring such data
is hardly ever possible. Only when law enforcement seizes
an illicit service and the wallet data on the seized servers re-
mains intact, does this enable large-scale evaluation. Through
a unique collaboration with Dutch law enforcement, we man-
aged to find three of those rare cases where wallet data was
largely intact and we were granted permission to use this data
for research purposes.
To evaluate attribution, we interview front-line profession-
als in law enforcement about how they use ‘attribution-as-a-
service’ and what value they attach to blockchain intelligence
in investigations. This, when mapped against the design of
cryptocurrencies such as Bitcoin [29] and the internal wal-
let architecture as described by [40], yielded three methods
for evaluating attribution: address overlap, money flows, and
address roles. The first is a generic evaluation of attribution
and provides insights into coverage of an entire illicit service
wallet e.g., a darknet market or a mixing service wallet. The
latter two correspond with the ways we found that profession-
als trace cryptocurrencies in investigations. To evaluate attri-
bution, we leverage seized data from recent (2017 2019) law
enforcement take-downs of three known illicit services: Best-
Mixer, Hansa Market, and Wall Street Market. This allows
us to investigate the attribution provided by market leader in
blockchain intelligence Chainalysis and unravel factors
influencing attribution, which are relevant beyond just these
three cases. We make the following contributions:
We evaluate attribution by commercial blockchain intel-
ligence market leader Chainalysis on three illicit services
and find their attribution to be a reliable, lower bound
of up to 95% of addresses in seized wallets with false
positives being rare (<0.5 percent).
We reflect on this finding by engaging with law en-
forcement professionals and learn that they anticipate
conservative attribution, whilst discovering they employ
two distinct tracing strategies transaction-based and
relation-based tracing respectively we can use for mea-
suring attribution accuracy.
We investigate address and cluster-based attribution us-
ing their role within a service wallet like relation-based
tracing done by law enforcement and demonstrate how
attribution changes based on blockchain intelligence tac-
tics, and find that wallets of services for a large part
depend on one type of behavior or role.
We use the directed acyclic graph design of UTXO-based
blockchains to evaluate attribution using a technique we
call money flows, allowing us to evaluate transaction-
based tracing and find that attribution accuracy is depen-
dent on time: the earlier the address lived in the illicit
service lifespan, the more likely it was not attributed.
We contrast our findings against a second, public
blockchain intelligence provider, Arkham Intelligence,
and show that attribution hinges on clustering highly
service-specific components such as the escrow service
of a darknet market.
The remainder of this paper is structured as follows. First,
we describe the nexus of cryptocurrencies and crime in Sec-
tion 2. Then, we describe our methodology and provide data
descriptions in Section 3. We interviewed law enforcement
officers to learn how they trace cryptocurrencies and their
perceptions on attribution in Section 4. Leveraging the results
from this, we build three evaluation techniques throughout
the core of this paper. We discuss the first evaluation, we
call ‘address overlap’, in Section 5. We use well-known de-
sign features of cryptocurrencies to evaluate these two tracing
strategies. We refer to these evaluations as ‘money flows’
and ‘address roles, respectively, for which we discuss the
results in Sections 6and 7respectively. Then, we will attempt
to generalize these results across data providers in Section
8and map those results to the practice of law enforcement
tracing, as that goes further than the three datasets this study
evaluates using ground truth data. We contrast with related
work (Section 9) and discuss limitations and future work (Sec-
tions 10). Section 11 concludes. Finally, we discuss ethical
considerations at the end of paper.
2 Crypto & Crime
Bitcoin [29] was the first blockchain-based cryptocurrency.
Its inception in 2009 instigated a new economy of services to
buy, sell, spend, and trade cryptocurrencies. Cryptocurrencies
have also generated an entire ecosystem of businesses, such
as payment processors and cryptocurrency exchanges, which
often function as the off- and on-ramps to the traditional fi-
nancial system [4]. Blockchain technology allows people to
remain relatively anonymous, in this case, pseudonymous.
This presumed pseudonymity has sparked the use of cryp-
tocurrencies for various criminal activities transacting on
underground markets and ransomware payments. On the other
hand, law enforcement and private investigators use the same
pseudonymous nature to trace cryptocurrencies. For them, two
concepts are vital: attribution (i.e., what name is linked to this
pseudonymous cryptocurrency address) and clustering (i.e.,
which addresses belong to the same actor). Cryptocurrency
tracing is focused on these aspects. This field takes advantage
of the design principles of these currencies to combine and
attribute groups of addresses. For this, it is essential to note
that bitcoin and others are cryptocurrencies in which wallets
1358 34th USENIX Security Symposium USENIX Association
have one or more pairs of keys where money is deposited on a
(derivative of a) public key (i.e., the cryptocurrency address).
One spends money by signing a transaction with a private
key [3,29]. Every transfer is denoted in a transaction, and
they are considered correct if they have valid signatures and
are included in a block. Transactions consist of one or more
inputs and define one or more outputs. Coins that use such a
system of multiple inputs that refer back to other outputs are
sometimes called UTXO-based cryptocurrencies (‘UTXO’
stands for unspent transaction output) [7]. Every transaction
input links back to another transaction’s output to prevent
double-spending. The UTXO model of cryptocurrencies, such
as Bitcoin, sometimes requires multiple inputs in the trans-
action to meet the amount sent. For such a transaction to be
agreed upon, multiple UTXOs (and most likely also multi-
ple addresses) must sign the transaction, of which one can
assume common ownership. This is called the co-spending
heuristic [26]. One can create so-called ‘clusters’ of addresses
by applying co-spending to addresses.
The other primary type of clustering is what is often re-
ferred to as the change address heuristic. The output often
includes a change address because one can only spend a full
UTXO, which rarely matches the amount one wants to pay.
The remaining amount is returned to the sender’s wallet in
that output. Various heuristics exist, all applying the scenario
where there could be change i.e., a transaction consisting of
two outputs: one payment and one change, and two outputs
only [15]). Commonly an output is considered a change if it
is the first time this address is seen on the blockchain, or it is
a self-change i.e., the output address is also an input [26].
Variations also exist that state that no unnecessary inputs are
added to the transaction [31] and the exact address types and
features are used [24,28].
Two prominent types of platforms where cryptocurren-
cies, crime, and attribution attempts converge are darknet
markets and bitcoin mixers. Mixing services operate in a
legal gray area: in some jurisdictions, such as the US, cen-
tralized mixing services are under pressure from regulators
and law enforcement. For example, the operator of Helix
Mixer pleaded guilty to money laundering [14] and was fined
by FinCEN [19], whereas Chipmixer.io has been seized by
law enforcement [18,20]. Lately, mixing services have been
sanctioned, such as Blender.io and Tornado Cash [21,36,37].
Darknet markets are platforms where illicit goods and services
are transacted in cryptocurrencies [11]. These marketplaces,
of which some of the most prevalent have been Silk Road,
AlphaBay, Hansa Market, and Hydra Market [12], are used to
transact illegal goods and services [6,42,46]. Given that its
use case has primarily been facilitating crime since the incep-
tion of the first darknet market (Silk Road), law enforcement
has pursued these markets [47]. Seizing such services pro-
vides law enforcement with a data gold mine based on which
law enforcement can pursue users [16,17]. But this data also
gives unique insights into the inner workings of darknet mar-
kets [13,45]. Here, we leverage two seized darknet markets
(i.e., Hansa Market and Wall Street Market) and one bitcoin
mixer (i.e., BestMixer) to investigate how they are attributed
in commercial blockchain intelligence.
3 Methodology
Our study consists of two phases. First, we begin with
conducting interviews to understand how professionals use
and rely on ‘attribution-as-a-service’ in their investigations.
Second, we conduct a case study wherein we evaluate the
attribution of seized data from three known illicit services
and blockchain intelligence on these services provided by
Chainalysis. We will describe these data sources in more
detail in this section, validate their completeness, and provide
data descriptives. Last, we discuss the ethics of using seized
data in our work.
Interviews. We engaged with cryptocurrency tracing spe-
cialists in law enforcement to learn how they use and what
value they attach to blockchain intelligence in investigations.
Here, we opt for semi-structured interviews.
Our goal for these interviews is two-fold. First, we want
to learn how people use tracing tools and the attribution they
provide to align our measurement methodology with actual
use. Second, we used this opportunity to learn about the trust
participants place in the attribution provided via these tools.
Through our law enforcement network, we learned that
within Europe, there are between 25 and 50 leading experts
involved in mostly high-profile cases and, therefore, aware of
the latest developments. We believe that these specialists are
of most use to this study as they are more likely to not only
see examples from their own cases but could also be asked
to reflect on the investigative process as a whole. We get to
this population size of 20 to 50 experts via regular partici-
pation at the leading European law enforcement conference
on cryptocurrency tracing, one of the two worldwide leading
cryptocurrency conferences for law enforcement. Not all law
enforcement agencies allow participation in scientific studies,
even though we tried contacting specialists at these agencies.
That led to a pool of about 12 specialists we could contact and
be allowed to participate. Six of those were able and willing to
be interviewed (see Table 1). All of them worked in different
# Role Relevant years in service
1 Detective 4
2 Analyst 3
3 Analyst/Detective 3
4 Analyst 3
5 Detective 4
6 Detective 7
Table 1: Interviewees and their background
USENIX Association 34th USENIX Security Symposium 1359
units and teams at various law enforcement agencies, although
they might know each other and have shared best practices.
While this may seem a small sample, these respondents make
up a significant proportion of the total population of 25-50
specialists. All of them were categorized as detectives or ana-
lysts, where the role of analysts differs from detectives as they
are not responsible for the entire case. R-3 started as analyst
and later became detective.
All specialists (
n=6
) were asked how they trace cryp-
tocurrency money flows to help contextualize our results. We
opted to interview law enforcement officers rather than, for
instance, industry analysts because of LEAs ability to sub-
poena regulated services. This is relevant here, as it provides
a feedback loop that might be lacking for analysts who do
not have that ability and allows them to discover errors in
attribution more readily. Logically, this would imply that law
enforcement professionals receive feedback on attribution
more often. Therefore, they represent a more relevant popula-
tion to interview.
The interviews were conducted using a predefined protocol
(see Appendix A). The interviews consisted of two phases: a
first phase in which general tracing methods were discussed,
and a second phase in which we asked our participants to
reflect on the preliminary findings of the case study on the
three data sets. All interviews were recorded (with consent
from the interviewees) and transcribed by the lead researcher.
Coding was done using qualitative data analysis software
(Atlas.ti). First, themes were inductively selected, leveraging
the interview protocol to identify initial themes. Second, de-
ductive coding identified other themes that aligned with the
study and provided more detailed insights into respondents’
perspectives. All themes were reviewed by checking coher-
ence within coded data extracts and then evaluated against the
entire dataset. During this phase, some themes were merged,
split, or discarded. Cohen’s Kappa was not applied in this
thematic analysis because the coding process was primar-
ily interpretive and iterative, rather than a rigid application
of a predefined codebook suitable for statistical agreement
measures.
Emerging themes—aligned with the structure of our
interview protocol—were coded inductively, and we reached
thematic saturation after the six interviews were completed
(see Appendix Bfor the codebook). Saturation occurred early
in the process (see Appendix Cfor the saturation plot); after
four interviews, almost no new codes were discovered.
Data sources. To evaluate attribution, we use cryptocur-
rency addresses as a basis. This means we build a data
model with addresses as a common identifier between
datasets. From Chainalysis, where we have access to based
on our collaboration with law enforcement, we collect
their intelligence on all addresses of the three included
illicit services: BestMixer, Hansa Market and Wall Street
Market. Chainalysis provides tools to trace cryptocurrencies
and compliance solutions for financial institutions and law
enforcement. Chainalysis identifies labeling strategies using
“manual and automated techniques" [9]. This collection was
done by simply downloading all cryptocurrency addresses
of these three services from this tool using the built-in
download button. Additionally, for generalization purposes,
we also retrieved data from a second blockchain intelligence
provider: Arkham Intelligence. Also here, we collected all
addresses present in their tool. We will use their data to
reason about the generalization of the results. We used a
full Bitcoin node, imported all addresses into the wallet,
and rescanned the blockchain. We then downloaded all the
transactions that belong to these addresses. For BestMixer,
we were given access to the master key, which we used
to derive all addresses until there was a significant gap of
unused addresses. We validated this with the internal wallet
database, which we had access to. For the seized darknet
markets, capturing cryptocurrency addresses was a little
bit more complex given the format in which the data was
provided to us. We were given access to relevant back-end
database tables and of those tables, only relevant fields
containing cryptocurrency transactions. We used the data
in those tables to identify the addresses on the blockchain
only if a transaction and/or output index were provided. We
used a limited dataset, meaning just the addresses or other
information, such as transaction hashes and output indices, to
deduce the full wallet. When things were unclear, we could
confer with involved investigators.
Data validation. For BestMixer, we had access to the master
key and, for verification purposes, the wallet database. With
the master key of BestMixer and the fact that this was a hi-
erarchical deterministic wallet, we could safely state that we
had 100 percent coverage. We reverse-engineered the method
to derive addresses from the master key, which used an in-
dex starting at 0 and incremented for each newly generated
address. Using this method, we generated addresses starting
from index 0 for the first address until the last 2,500 addresses
that did not exist on the blockchain. To see if the arbitrarily
chosen amount of 2,500 addresses was not too small, we cal-
culated the maximum gap between two existing addresses
before that. This gap was 217, significantly more than the
20 specified by [38], but also way smaller than the arbitrar-
ily chosen number of 2,500. The only address we could not
generate from the master key was a vanity address used for
donation purposes and publicly advertised. This address ap-
pears to be imported, since a vanity address generator most
likely generated it, and we added it manually to our dataset.
For both Wall Street Market and Hansa Market, we did not
have access to the master key, but instead used the back-end
databases. Here, the relevant tables used auto-increment in
the identifier, which we used to measure the completeness.
Hansa Market had four relevant tables for us: ranging in
completeness between 97.00 and 99.58 percent. For Wall
1360 34th USENIX Security Symposium USENIX Association
Street Market, we also used four tables. Completeness ranged
from 99.19 to 100.00 percent.
Data descriptives. Our data model of seized illicit service
addresses consists of 182,612 BestMixer addresses, 1,157,519
Hansa Market addresses, and 553,777 Wall Street Market
addresses that have been used on the blockchain form the
basis of our data. The seized services also include addresses
generated but to which the user, for instance, never sent funds.
We exclude those because they could have only been known
to the service itself. With the number of counted addresses,
we mean the number of addresses used in the ground truth.
Blockchain intelligence by Chainalysis resulted in 44,840
addresses for BestMixer and 914,866 and 525,847 for Hansa
Market and Wall Street Market respectively. In Section 5, we
speak to attribution by first analyzing address overlap.
Approach. First, we conducted interviews using the proto-
col described earlier. Looking forward slightly to the results,
it follows from these interviews that apart from the evalu-
ation metric ‘address overlap’, we can compose two more
representing different tracing types (‘money flows’ and ‘ad-
dress roles’), which represent transaction-based tracing and
relationship-based tracing. We use these methods to reason
about attribution more representative of the real world. Vari-
ous factors might influence its use cases, which have not been
empirically evaluated yet. This will be the first step, after
which we can look at the real-world impact.
4 Law Enforcement Perspectives
Before we evaluate the attribution by Chainalysis, it is
important to understand how law enforcement professionals
expect such blockchain intelligence providers perform and
what use cases they have for it. We employ semi-structured
interviews to engage with law enforcement professionals
to understand how they use blockchain intelligence and
investigate how they perceive their attribution. That way,
our evaluation fits with real-world use and expectations.
Through our law enforcement network, we contacted respon-
dents (
n=6
) who have been part of various high-impact
multi-national cases over the past years and work for
European law enforcement agencies (see Section 3). All have
worked in law enforcement for several years, specializing in
cryptocurrency tracing, and have applied those skills in many
significant cases. The full interview protocol is attached in Ap-
pendix 11. We use R-1 to R-6 to refer to specific interviewees.
Tracing methods. After learning about the background of
the interviewees, we first asked them how they used the tools
and how they perceived the quality of attribution in the tools
they had available to them. We learn that all participants
have proactively started specializing in cryptocurrency trac-
ing. Apart from one, they all received some form of formal
training, including training from the blockchain intelligence
provider itself and a dedicated training institute. About this
training, R-2 said: “I did it so that I could use it for refer-
ence [in the reports I write] as some form of certification
[in court]”. This points out why all organizations organize
some form of training. There is a need to train law enforce-
ment officers to handle these complex tools. Both R-2 and
R-5 pointed out the importance of certification of this type of
work in a law enforcement environment. In addition to train-
ing, three respondents mentioned using informal networks
of law enforcement officers for knowledge gathering as an
alternative to training courses. And two respondents men-
tioned following cryptocurrency-related publications to stay
up-to-date with the latest developments.
All used at least one commercial tracing tool, half of them
using two. All used at least Chainalysis software, highlighting
their position as market leaders in the field. Yet, all expressed
the need for at least two tools, defining an apparent demand for
a second opinion, with one participant saying: “Trust is good,
checking is better’ (R-2). Or, as R-1 mentioned for a crimi-
nal trial when he temporarily had two tools at his disposal:
“You can sometimes see clear differences [when comparing
information from your case]. R-1 did not specify this further,
leaving it open to whether the visualization or the underly-
ing data was the reason for the difference. The interviewees
who had access to one blockchain intelligence provider men-
tioned the high cost as the cause. One notable mention is a
respondent (R-1) who used only one tool, but compensated
for that by using internal intelligence sources such as seized
datasets. But even though respondents had access to com-
mercial tracing tools, they also regularly used public block
explorers because, as R-4 said, “you want to know very specif-
ically where it is sent to, referring to the analysis of smart
contract calls, he analyzes on a block explorer.
The starting point from which an investigator starts tracing
depends on a case-by-case basis. However, the tracing itself
can be categorized in two ways. One way is relationship-based
(i.e., to look for relationships between wallets). Relationships
are based on (grouped) sets of transactions. The other way,
transaction-based tracing, looks at transfers chronologically,
transaction-by-transaction. Like following, for instance, a ran-
somware payment. No distinction was found based on the
respondent’s role in the choice of tracing.
R-6 said that he traces traction-based by employing the
change-address heuristic, a key indicator of following the
transaction flow, ergo we call this transaction-based tracing.
R-1, on the other hand, describes tracing as looking for related
wallets, hence we call this relationship-based tracing.
The context of the case and the personal preference of
the investigator can be correlated to the tracing method used.
Or as R-1 said: “I look at my job in two ways, once in do
blockchain analysis, I feel more like an analyst. I go one step
further than hard evidence and then go back [...]”. Later on,
the respondent calls this “switching hats” between analyst
USENIX Association 34th USENIX Security Symposium 1361
and investigator, therefore changing the way of tracing.
Transaction-based tracing is commonly used by investi-
gators who appear to focus their roles more on the finances
of a subject, such as turnover analysis (i.e., estimating
how much a suspect has earned illicitly). On the other
hand, relationship-based tracing appears to be better suited
for identification of the wallet’s owner. We must note,
though, that the user interface of the most commonly used
commercial tool (Chainalysis Reactor) by the respondents
is geared toward relationship-based tracing. When R-3 was
asked whether he uses a feature to inspect the transaction, in
other words, doing transaction monitoring: “No, never. [...]
That may be because this is most important in KYC [Know
Your Customer] and that sort of business. Two respondents
explicitly mention a hybrid form of tracing. R-5 described
how he applies mostly relationship-based heuristics. Still,
if the tools then give an indirect path between two wallets,
transaction-based tracing is employed to manually strengthen
the analysis of relationship.
Perceptions on attribution accuracy. We asked the inter-
viewees about their perceptions of the tools with regards to
attribution accuracy (i.e., did they encounter cases in which
labels were missing and/or wrong labels were present). We
checked if they ever revisited or cross-checked their analy-
ses from the blockchain intelligence provider. For example,
if they had seized criminal assets, and therefore had ground
truth to check the service’s attribution accuracy. Or after they
received information from third parties they had subpoenaed.
Or if they verified attribution accuracy through self-initiated
transactions with an illicit service via the tools.
Two respondents acknowledged they do not revisit their
analyses in all cases after a seizure, but they did see the need
for it. All others revisit their initial analysis. We assume that
in the cases where investigators do not revisit their analysis,
they do so as they are not actively involved in the remainder
of the case. Here, we deduce this from those who answered
similarly: these respondents devoted more time to helping
colleagues as they were the only cryptocurrency specialists
in their offices. R-3 mentions revisiting the analysis: “Then
enter the suspect’s wallet [...] to see where it comes back
in my analysis. [...] That way, I can attribute the wallet to
a specific wallet [in the analysis]”. R-3’s response shows
investigators adding their ground-truth attribution to their
analysis later on, and all respondents appear to do this most of
the time. All investigators return to their analysis unless they
temporarily assist colleagues with their cases. This workflow
potentially harms the quality of the analysis, as a feedback
loop is missing.
By knowing this workflow, we can ask them about their
knowledge of conflicting attribution (i.e., cases in which they
have information conflicting with a blockchain intelligence
provider’s claims). Five out of six respondents perceive an
underestimation in other words, false negatives. But when
asked about two different sources conflicting, most are caused
by what are sometimes called ‘nested services’ exchanges
that ‘host’ little exchanges that have most or all their liquidity
stored on their bigger ‘parents’ [8]. There, the parent label is
commonly displayed in the tools, but after a subpoena on the
host exchange, from subscriber information, it can be derived
that it is a nested service. In these cases, the conflicting
attributions are superficial and do not undermine trust in the
attribution. When asked how they think blockchain intelli-
gence providers work, five respondents mentioned at least
one way in which those tools attribute addresses. The sixth
respondent called it a “black box”, not wanting to speculate
about how they work. All sources OSINT, interactions by
blockchain intelligence providers, and customer data were
mentioned equally as sources from which these companies
generate attribution. Overall, respondents expect false
negatives, although they have encountered false positives,
too. They find estimating the sheer size of large-scale
evaluation, like in this paper, difficult to guess, which is
probably because they work primarily on a case-by-case basis.
Interviewee perceptions of ground truth data accuracy.
In addition to understanding tracing, we wanted to understand
the implications of using these tools in day-to-day operations.
Overall, we observed that respondents did expect an underes-
timation in attribution, yet had difficulty stating by how much
before we showed the results to them. Furthermore, before
showing the results, they also reported encountering false
positives. When asked, these results could be attributed to
nested exchanges (i.e., exchanges that store their liquidity on
a large other exchange and therefore are difficult to attribute
themselves) of which only the ‘parent’ exchange was labeled.
We shared preliminary results from the overlap analysis in
Section 5with the respondents, to learn about the effects these
results might have on their workflow and (if so) how that
would influence their workflow. When confronted with the
results in Figure 1, the respondents said they did not expect
these results but were unsure what to expect to begin with.
Subsequently, they had many questions about what this meant
for their daily work and suggested workflow adjustments, such
as using multiple tools to cross-check attribution, as the most
important change. One respondent suggested regular auditing
of the accuracy of attribution. That illustrates the relevance of
our work, but more importantly, it means that we will try to
provide some starting points that, together with the insights of
this study, support a better workflow as will be discussed later
in the generalization and discussion sections of this paper.
Although not explicitly covered as a separate section in
the interview protocol, as the interviews functioned to gather
input on what the remainder of the study should look like, we
can still digest valuable insights into how the interviewees
can benefit from the evaluations discussed in this paper. We
see that when it comes to how the interviewees perceive attri-
bution accuracy, they talk about attribution accuracy in terms
1362 34th USENIX Security Symposium USENIX Association
BestMixer
Hansa Market
Wall Street Market
BestMixer Hansa Market Wall Street Market
Ground Truth Chainalysis Ground Truth Chainalysis Ground Truth Chainalysis
True Positives 182,612 (100.00 %) 44,821 (24.54 %) 1,157,519 (100.00 %) 913,171 (78.89 %) 553,777 (100.00 %) 525,236 (94.85 %)
False Positives - 19 (0.01 %) - 1,695 (0.15 %) - 611 (0.11 %)
False Negatives - 137,791 (75.46 %) - 244,348 (21.11 %) - 28,541 (5.15 %)
Figure 1: Overlap in addresses (green) between ground-truth (blue) and Chainalysis (yellow)
of the number of addresses. This is logical as this is a number
displayed in the tracing tools they see daily. However, talking
about impact, they mainly refer to missing attribution and,
therefore, missing information. This is implied by the fact
that most interviewees expect underestimation, and, for in-
stance, R-1 does extra cross-checks with internal data sources.
Further insights might benefit this as they explain why and
when such cross-checks are relevant.
5 Address Overlap
The first, and most straight-forward way to measure attri-
bution accuracy is by counting addresses and their overlap
between our ground-truth data and blockchain intelligence
from Chainalysis. This gives an initial view of attribution and
what influences this.
From Chainalysis, we collected all addresses they attribute
to these services. Then, we calculated the overlap between
the ground truth data versus the labeled addresses from
Chainalysis, as plotted in Figure 1. It can be observed that
for BestMixer, the number of true positives by Chainalysis
is
44,821
(
24.54%
), false positives is
19
(
0.01%
), and false
negatives
137,791
(
75.46%
) out of
182,612
addresses. For
Hansa Market, the true positives are at
913,171
(
78.89%
),
the false positives at
1,695
(
0.15%
), and the false nega-
tives at
244,348
(
21.11%
) out of a total of
1,157,519
ad-
dresses. Lastly, for Wall Street Market, the true positives are at
525,236
(
94.85%
), the false positives at
611
(
0.11%
), and the
false negatives at 28,541 (5.15%) out of 553,777 addresses.
From this figure, we can infer significant differences be-
tween the different types of services. Higher accuracy is
achieved for darknet markets compared to mixers. However,
we must note that we cannot generalize without consider-
ing what causes (in)accurate attribution. Because of the very
high accuracy of Hansa Market and Wall Street Market, we
manually inspected only BestMixer’s false positives.
Apart from some edge cases, which we could not explain,
we found two main reasons why attribution is propagated
incorrectly. The main reason is a directly spent change output.
As one might know, a payment in Bitcoin and, in fact, all
similar (UTXO-based) blockchains often generate a change
output because you have to spend an entire UTXO. There
are heuristics in place that attempt to identify this change
address in a transaction. This change addresses a heuristic,
first defined by [26], which states that a change address should
be a freshly created address. This means it has not been seen
on the blockchain before. We identify two cases in which
implementing this incorrectly could cause problems.
There is a protocol called Child Pays for Parent (CPFP),
which became default in 2016 [44]. CPFP allows transactions
to prioritize the incoming (unconfirmed) transactions they
want to spend by setting a higher fee. This causes miners to
prioritize their ancestors’ transactions because they want to
optimize mining rewards. CPFP can make it possible that
a fresh change address is already spent, so the change ad-
dress heuristic triggers incorrectly. Therefore, it could result
in the actual deposit being labeled as the change. This then
is propagated, meaning that all input addresses of the wal-
let that deposit bitcoins into the service are considered part
of the service. Next to direct spent change (
3
direct cases,
and
5
implied, indirect, cases from those
3
), we identify two
other causes of false positives. In one case, the wrong change
address was identified because the address had a different
label attached to it, meaning the deposit address was specified
as change, which in turn propagated through such that the
input address also belonged to BestMixer. We also noticed
BestMixer was using a relatively high network fee in some
cases the deposit was similar to the deposit transaction’s fee
(
3
cases), therefore incorrectly assuming that this address also
belonged to BestMixer. In
3
cases, the reason was unknown,
and from those
3
cases, those (indirectly) influenced incorrect
attribution of 4 more addresses.
USENIX Association 34th USENIX Security Symposium 1363
6 Money Flows
A way to measure attribution, reflecting real world transaction-
based tracing, is by leveraging the UTXO model of
blockchains such as Bitcoin.
We apply this model to design an evaluation technique
that resembles how our interviewees trace illicit money
transaction-based and call this technique money flows. We en-
vision a service’s wallet as a directed acyclic graph in which
the nodes are the UTXOs, and the edges are the transactions.
A path of UTXOs from the deposit into a service until it
leaves the service wallet again is a money flow. We use the
addresses corresponding to the UTXOs to map it against the
ground-truth data and data from Chainalysis. This graph will
be acyclic by the definition of the UTXO model in Bitcoin, as
we use UTXOs rather than addresses. The other way around
so using addresses rather than UTXOs is unfeasible, as
we then cannot guarantee the graph is acyclic, and we might
be unable to create a finite set of flows. For each service, we
collected all transactions on all addresses of that service in
the ground-truth. Suppose a transaction consists of more than
one input and more than one output. In that case, inputs are
linked to the corresponding outputs as in a queue, as proposed
by [2]. This is in line with how, in case law, money laundering
contamination (i.e., taint) is calculated. Additionally, possi-
ble paths grow exponentially within large service wallets,
where transactions have many inputs and outputs. Especially
in darknet markets such as Hansa Market and Wall Street
Market, due to their escrow service system. This makes it
impossible to calculate all possible paths using this method in
a reasonable time frame, even when run on high-performance
computing systems. Every path starts at a deposit, which we
identify if a UTXO has an incoming UTXO that is not part
of the ground-truth and a destination UTXO that is part of
the ground-truth. All deposits generate all paths until a path
comes across an edge where the destination UTXO is not part
of the ground-truth.
We calculated all flows of the three services, BestMixer,
Hansa Market, and Wall Street Market, to better understand
how attribution performs in terms of money flows. We de-
fine three categories for a flow: missed,partial, and full. We
state that if none of the addresses of the UTXOs of the flow
2018-03
2018-05
2018-07
2018-09
2018-11
2019-01
2019-03
2019-05
0.0
0.2
0.4
0.6
0.8
1.0
Propotion of flows covered
missed partial full
Figure 2: Money flows over time for BestMixer
2015-07
2015-10
2016-01
2016-04
2016-07
2016-10
2017-01
2017-04
2017-07
0.0
0.2
0.4
0.6
0.8
1.0
Propotion of flows covered
missed partial full
Figure 3: Money flows over time for Hansa Market
2016-01
2016-07
2017-01
2017-07
2018-01
2018-07
2019-01
2019-07
0.0
0.2
0.4
0.6
0.8
1.0
Propotion of flows covered
missed partial full
Figure 4: Money flows over time for Wall Street Market
are in the data by the commercial data provider, it is missed.
If all addresses are present, we categorize it as full, mean-
ing the commercial data provider fully covers it. In all other
cases, a flow is classified as partial, representing the com-
mercial data provider partially covers it. We find that money
flow coverage is lowest for BestMixer (see Figure 2). But
by plotting those over time, we see apparent differences in
timeframes. At the start, most money flows are missed. This
makes sense, as during its initial phase, almost nobody knew
about these services as they were still being developed, not
publicly advertised, and only tested by a limited number of
individuals. BestMixer started public operations in May 2018,
and we see that Chainalysis can quickly attribute (part of) the
service thereafter. This does point out an important factor in
attribution: interacting with a service is key to getting attribu-
tion if there is no way to broaden attribution by some unique
fingerprint that works across blockchain(s).
On the notion of unique fingerprints: there appears to be
a limiting unique fingerprint present that prevents forward-
tracing of change within the service’s wallet. We found a shift
in the used wallet fingerprint starting mid-February, which
caused an immense drop in money flow coverage. The only
flows they can cover are those involving the donation address
of BestMixer, which was publicly advertised on the service’s
website. We noticed that the wallet structure of the service
changed, and precisely at that moment, the money flow cover-
age declined too. This change involved, among other things,
disabling segregated witnesses and changing the address type.
With Hansa Market and Wall Street Market (see Figures 3
and 4), we again see low money flow coverage initially, sup-
1364 34th USENIX Security Symposium USENIX Association
Service Name Missed Partial Full
BestMixer 71.18 % 11.74 % 17.08 %
Hansa Market 9.34 % 14.33 % 76.34 %
Wall Street Market 6.13 % 4.10 % 89.77 %
Table 3: Money flow completeness percentages
porting our theory that interaction is important but difficult
in the first stage because a service is still relatively unknown
or even still in development. Another explanation for this is
open-source intelligence (OSINT). This OSINT could replace
the need to interact with a service and provide ‘historical in-
teractions’, allowing one to build a profile later. We learned
by extensively searching online for posts, news articles, pho-
tos, and videos and on (underground) forums that there is not
much OSINT available that is useful for attribution. That is,
public cryptocurrency addresses or tutorials of people show-
ing how to use these services that leak addresses. We only
found a few addresses online, which is negligible compared to
the total wallet size, and when manually applying heuristics
such as co-spending to them, it hardly gives any attribution
above an insignificant percentage.
Also, with Wall Street Market, we noticed a gap in attri-
bution being present at the beginning of 2018. Here, we also
believe a change in the wallet system occurred, though indi-
cators are less self-evident than with BestMixer. The format
of a transaction in the transaction table of the marketplace
changes roughly at that exact moment in time. Still, more im-
portantly, we saw a significant spike in the amount of UTXOs
in the wallet that moved to other addresses. This considerable
movement could also indicate why the money flow coverage
went down. Money flow coverage increases significantly at
the end of the lifetime of Wall Street Market. We believe this
is due to the exit scam attempted by the marketplace [1]. This
led to a large amount of money being moved to only a few
addresses and then withdrawn from the service, increasing
money flow coverage. Furthermore, some exit scam addresses
were publicly advertised online on Reddit. Having established
that the primary form of illicit services attribution is through
direct interaction, we see that getting historical attribution is
limited when addresses are not constantly reused. The lower
the balance of the service’s wallet and the higher the turnover,
the worse the historical attribution if only using current inter-
action with the service.
We showed that money flow coverage changes over time.
Chainalysis is more likely to cover only parts of the flow,
meaning it is more likely to cover only a central wallet. When
tracing UTXO-based, missed flows are potential cases one
can trace through a service without encountering an attribute
at any step in the process. All results of the percentages of
accurate labeling of the services are shown in Table 3. The
results show that attribution varies significantly over time.
Generally, we state that for BestMixer, Chainalysis covers
(partially or completely) a flow in
28.82
percent of cases. For
darknet markets, money flow coverage is much higher. For
Hansa Market, this is
90.67
percent, and for the Wall Street
Market, this is 91.87 percent.
7 Address Roles
Apart from UTXO-based tracing, we identify a second way
to trace illicit money flows on UTXO-based blockchains:
relationship-based. Here, we state that the role an address has
in the internal workings of the wallet of the service is essential.
We employ this reasoning for a third way of measuring attri-
bution: through the role of the address. We define three types
of transactions: a deposit transaction, an internal transaction,
and a withdrawal transaction. We state that a transaction is
a deposit transaction if none of the input addresses belong
to the ground-truth of the service and if one or more output
addresses do belong to the service. A transaction is an inter-
nal transaction if all the input and output addresses are in the
ground-truth. A transaction is a withdrawal transaction if one
or more of the input addresses belong to the ground-truth and
one of the output addresses does not, leaving room for one
or more output addresses to change addresses and belong to
the ground-truth. Based on the roles we give transactions, we
define the addresses’ roles. These have a combination of roles
based on incoming transactions and outgoing transactions.
An address, for instance, receives money (i.e., all incoming
transactions are of the type ‘deposit’) and then sends it to a
Incoming Transactions Outgoing Transactions Addresses Clusters Cluster Coverage Chainalysis
deposit withdrawal 62,606 56,110 11.22 %
withdrawal withdrawal 57,462 45,146 24.10 %
internal withdrawal 3,465 2,658 14.94 %
deposit internal 11,608 636 2.36 %
deposit, withdrawal withdrawal 3 439 92.03 %
withdrawal internal 46,300 276 6.88 %
Other Other 1,168 172 1.74 %
Total 17.55 %
Table 2: Address roles of BestMixer
USENIX Association 34th USENIX Security Symposium 1365
central wallet (i.e., all outgoing transactions are of the class
‘internal’). Using this, we define address roles. Then, we apply
co-spending heuristics to the addresses to build clusters of
these addresses. We do so because tracing tools also use co-
spending. With these datasets, where the private keys were not
exposed, and no CoinJoins could appear with other addresses,
we can safely state that this heuristic holds. We calculate
the address role coverage of the tracing tools in terms of the
percentage of clusters of a specific role they have labeled.
As might be expected, address role coverage is not random.
In the case of BestMixer, we notice that Chainalysis is better
at covering deposits immediately withdrawn again (see Table
2). We see that an incoming transaction can be of the type
‘withdrawal’, which seems counterintuitive, but it is, in fact,
a change from an earlier withdrawal of another user. When
we compare the results of BestMixer with those of darknet
markets, we see a difference in how they work and, therefore,
how they handle funds. Here, one might expect a more precise
differentiation between user funds. And that is also what we
see. For instance, the address roles of Wall Street Market
(see Table 4). The majority of the addresses are deposits and
withdrawals. These correspond to the order structure of the
marketplace, where users generate a payment address, and
the vendor withdraws the profit. This withdrawal is batched
and grouped with other transactions to save money for the
marketplace on the blockchain.
Hansa Market employs a different structure than Wall Street
Market, producing many more internal transactions. This
is because users at Hansa had a wallet at the marketplace,
whereas for Wall Street Market, they paid for an order at an
order-specific address. This greatly affects the wallet’s inter-
nal workings. For Hansa, many more internal transactions are
necessary to move funds to the right (escrow) wallets. One
might expect, also based on the overlap analysis, that these
internal transfers make it much easier to label Hansa. This is
indeed the case regarding address roles covered, but looking
back at the address overlap in Figure 1, we see the contrary.
This means that more internal (and maybe also relevant) ad-
dresses are labeled in Hansa than in Wall Street Market.
The difference in coverage between address roles gives
clear indicators of BestMixer’s decentralized nature. On a
higher level, BestMixer consists of large chains of deposits
and withdrawals like a peel chain (a term also used by [25]).
This can be deduced from the many roles ‘deposit
>
with-
drawal’ and ‘withdrawal
>
withdrawal’. Although Wall Street
Market also shows that pattern, what probably makes up for
that is that address reuse is more prevalent there, linking more
addresses. Hansa Market and Wall Street Market work dif-
ferently compared to BestMixer. This makes for different on-
chain behavior. Most importantly, they facilitate transactions
between buyers and sellers on the market. These transactions
occur on-chain to secure them in an escrow system using
multi-signature wallets. Hansa Market worked slightly dif-
ferently than Wall Street Market, as Hansa had a wallet per
account, whereas Wall Street Market generated an address for
a buyer to pay specifically for an order.
We can see the results of the difference in on-chain behav-
ior for the most occurring roles in Table 5. Hansa Market
Incoming Transactions Outgoing Transactions Addresses Clusters Cluster Coverage Chainalysis
deposit withdrawal 494,535 78,972 75.69 %
deposit internal 52,943 4,335 24.57 %
deposit, withdrawal withdrawal 632 1,051 95.34 %
internal withdrawal 901 677 59.53 %
withdrawal withdrawal 1,273 624 94.07 %
deposit, internal withdrawal 61 174 85.06 %
Other Other 2,285 92 6.52 %
Total 73.35 %
Table 4: Address roles of Wall Street
Incoming Transactions Outgoing Transactions Addresses Clusters Cluster Coverage Chainalysis
internal internal 383,353 146,210 95.51 %
deposit internal 371,249 124,672 96.03 %
deposit, internal internal 32 83,663 97.34 %
internal withdrawal 71,249 24,459 1.77 %
deposit withdrawal 69,831 24,392 4.49 %
withdrawal internal 66,401 19,086 93.91 %
Other Other 122,009 39,264 0.20 %
Total 86.84 %
Table 5: Address roles of Hansa Market
1366 34th USENIX Security Symposium USENIX Association
has relatively many internal-related roles. The account-based
wallet management likely causes that. First, one has to deposit
money into the market, adding to an account balance. That
balance can then be used to order something on the market.
When an order occurs, an escrow wallet must be created be-
tween the buyer, seller, and market which means another
transaction on the blockchain. This whole behavior causes
many internal transactions compared to Wall Street Market.
The different roles in a wallet clearly show what to focus on to
get good address role coverage. One needs a ‘vantage point’
from which to start expanding attribution. For BestMixer,
focusing on identifying peel chains would lead to greater ad-
dress role coverage. Another ‘vantage point’ was the internal
liquidity provider system in Bestmixer, which they used until
February 2019. That way, a strong ‘vantage point’ was lost.
Regarding Hansa Market and Wall Street Market: focus-
ing on market transactions causes high address role coverage.
Also, for Hansa Market: it is important to focus on identifying
the addresses belonging to account wallets. Those are the first
to discover when doing transactions with Hansa Market. To
identify market transactions, one can order something on the
market. A common denominator is to find a shared fingerprint
between transactions. That way, one can apply knowledge of
transactions to find more transactions. This can for instance be
done based on roles. Identifying how the market fee is calcu-
lated in an order allows an analyst to identify more rules one
can apply to already identified wallets to find new addresses.
Overall, we see a difference in address role coverage be-
tween BestMixer on the one hand and Hansa Market and Wall
Street Market on the other. We suspected that addresses are
reused more often in marketplaces. Therefore, we calculated
the average amount of transactions per address, which for
BestMixer turned out to be 2.00, for Hansa Market 2.32, and
for Wall Street Market turned out to be 2.27. The key obser-
vations here is, however, the variance which for BestMixer
is 0.42, for Hansa Market is 1 and for Wall Street Market is
4. Certain addresses occur much more frequently, and from
closely inspecting those addresses, we found that they belong
to pivotal roles in the darknet markets, such as the addresses
that receive the fee for the market.
Other than tracing transaction-based, picking related wal-
lets is more of a gamble for users of these tools. What is
relevant is more dependent on the case context, which means
this metric has less control over which of the related entities is
picked, compared to transaction-based tracing, where you of-
ten have the guarantee that core components such as internal
mixing pools or the escrow service are covered as the trans-
action trace is followed step by step. The real-world impact,
therefore, is more diffuse. On the other hand, role-based attri-
bution is closer related to the internal business operations of a
service and, therefore, leads to better insight into the internal
workings, which, for those wanting to improve attribution.
8 Generalization
To understand how our evaluation of attribution can be
generalized based on our findings, we take two perspectives:
by adding another blockchain intelligence provider, and
secondly, we reason about the generalization of clustering
heuristics and under which circumstances these can be
generalized.
Blockchain intelligence provider generalization. With the
use of a second data provider, we will attempt to general-
ize our results and pinpoint what efforts have to be put into
attribution manually or through proprietary techniques and
algorithms by blockchain intelligence providers. For this, we
use data from Arkham Intelligence. Arkham Intelligence is
a blockchain intelligence provider that aims to deanonymize
cryptocurrency flows by linking blockchain addresses to real-
world entities just like the other providers such as Chainaly-
sis. A distinctive feature of the platform is its Intel-to-Earn
program, which incentivizes users to contribute attribution
data through a marketplace model. In this study, we use
Arkham’s publicly available labels. Arkham has data about
only one ground truth illicit service used in this study: Wall
Street Market. We will use that to reason about generalization.
In Arkham, Wall Street Market’s wallet consists of only two
addresses. Both addresses are present in the ground truth data,
giving a false positive rate of exactly 0 . However, given there
are only two, the false negative rate is significant - as these
two represent only
0.0003%
of all addresses in the ground
truth data. Yet, these addresses are the core addresses of Wall
Street Market: they handle the fee the market receives in every
order ever placed on the market. These two addresses alone
are responsible for a significant part of all transactions during
the lifetime of the market (
10.44%
). This again shows the
importance for metrics such as money flows and address roles
rather than simply counting addresses. Given the low number
of addresses present in Arkham and the nature of Arkham as a
crowdsourced and OSINT-based provider, it is to be expected
that one or both addresses were submitted to Arkham and
no additional fingerprinting or manual analysis was added
by Arkham. Manual inspection shows these addresses were
merged based on just the co-spending heuristic. This leads
us to exclude the option that additional research to improve
attribution as is known to be done by blockchain intelligence
providers went into these addresses.
When evaluating attribution of Arkham in terms of money
flow coverage or address role coverage, we consider these two
addresses for this paper as starting points for attribution.
When applying the knowledge that these are fee addresses,
we can expand attribution to a total of
48,368
addresses
without any false positives. We do so by adding two domain
knowledge rules we can only apply to fee addresses in Wall
Street Market: (1) when a transaction deposits money into
these addresses, it comes from escrow or the buyer, and (2)
USENIX Association 34th USENIX Security Symposium 1367
2016-01
2016-07
2017-01
2017-07
2018-01
2018-07
2019-01
2019-07
0.0
0.2
0.4
0.6
0.8
1.0
Propotion of flows covered
missed partial full
Figure 5: Money flows of Wall Street Market by Arkham -
after clustering
the other output is the payout to the vendor. Then, when
calculating money flows (see Figure 5, the proportion of Wall
Street Market which is (partially) attributed is small (
6.44%
partially and
0.42%
fully covered). This highlights the efforts
blockchain intelligence providers take manually or through
in-house algorithms to improve attribution accuracy, and
that this has to be done on a service-specific level shows the
difficulty of generalizing. Arkham does not add additional
clustering, which we performed ourselves. Looking at the
address roles in the enhanced cluster of Wall Street Market
(see Table 6), we observe that attribution primarily covers
addresses and clusters used for internal transactions from
the service. Apparently, the market’s fee addresses together
with the clustering are good at attributing the ‘backend’ of
the service, but not good at attributing the deposit side of the
market.
Heuristics generalization. Apart from generalization across
blockchain intelligence providers, we can also look at gener-
alization of clustering heuristics. Using our empirical analysis
we can clarify under which circumstances these heuristics
can be generalized without the chance of false positives. It
is to be expected that two heuristics play a major role across
all services: co-spending and change-address. Both can be
generalized, assuming certain conditions are met.
Transactions Addresses Clusters Cluster
Coverage
Incoming Outgoing
deposit withdrawal 494,535 78,972 0.01 %
deposit internal 52,943 4,335 0.00 %
deposit, withdrawal withdrawal 632 1,051 87.06 %
internal withdrawal 901 677 81.09 %
withdrawal withdrawal 1,273 624 99.36 %
deposit, internal withdrawal 61 174 2.87 %
Other Other 2,285 92 4.35 %
Total 2.48 %
Table 6: Address roles of Wall Street Market by Arkham -
after clustering
For co-spending, the major threat is that a CoinJoin occurs
in which the service participates. When a service does not use
this feature and no external factors like users of the service
can exploit a feature that allows for a CoinJoin to occur,
the heuristic holds. For example, MtGox [23,28] allowed for
a private key to be added to an account. If that private key
was associated with a CoinJoin transaction, it might lead to
false positives. In all other cases, where the service’s wallet
cannot be extended with external private keys, it does hold,
and in those cases, it must be considered conservative i.e.,
it might create multiple clusters of the same wallet. Therefore
the heuristic creates a tendency towards false negatives rather
than false positives.
For the change-address heuristic, we have also seen signs
that it has been employed in clustering ground truth services
by Chainalysis, especially Bestmixer, which wallet contains
of large peeling-chains, which can only be attributed on this
scale with some form of change address clustering. However,
this heuristic is more complex to evaluate and we have, in
fact, seen some signs of false positives on Chainalysis related
to change address detection. For instance, in the case when
CPFP was active. PayJoin is a technique specifically built to
break this heuristic. Again, just like with co-spending, one
can assess if a service supports PayJoin, for instance by as-
sessing of a PayJoin-endpoint is present for deposits. If so,
both co-spending and change-address heuristics should be
treated carefully, as it can no longer be guaranteed that these
heuristics hold. They might lead to false positives.
Both heuristics can be applied generally at any service after
thorough assessment of those services: do they interact with
a CoinJoin service e.g., for internal liquidity management
or do they allow external wallets to be imported that might
have interacted with a CoinJoin service? And likewise
for PayJoin: does it support that? If so, then similarly to
CoinJoin, one should treat the heuristics carefully as they
might generate false positives. But in the three ground truth
datasets, none of those conditions were met, meaning those
heuristics were accurate but conservative, resulting in false
negatives over false positives by design.
Mapping to law enforcement practice. The insights gained
suggest that generalization is highly case-specific. Most inter-
viewees admitted to having a limited understanding of how
blockchain intelligence providers operate. This highlights the
importance of critically examining the source of attribution
by the law enforcement professional to determine if and how
it can be expanded. Additionally, it is essential for blockchain
intelligence providers to offer more transparency about their
methods. In terms of generalization across providers, in the
real world, this generalization question has limited impact,
since LEAs rely almost exclusively on one tool. The in-
terviews confirmed that all law enforcement agencies use
Chainalysis. Only one agency was using a second commercial
provider. In fact, how we do our study with a second public
1368 34th USENIX Security Symposium USENIX Association
provider resembles the results from our interview study: law
enforcement using a public tool for verification purposes.
9 Related Work
Our paper builds on and benefits from recent advancements in
three topics. First, we build and expand on blockchain-based
heuristics like co-spending and change address heuristics.
Second, we add to studies that evaluate measurement metrics.
Third and last, we advance work on leveraging ground-truth
data analysis, in our case to evaluate attribution.
Blockchain based heuristics. We build upon the work first
introduced by [26], later enhanced by [25] and others. They
define state-of-the-art clustering and attribution practices
used till this day. Clustering is an important factor for
scaling attribution. As said before, it propagates a label from
one address to many, grouping addresses on UTXO-based
blockchains. Many more studies apply different variations to
these heuristics, but the core concepts are always the same
and based on [26]. According to our research, attribution
practices described by [26], such as interacting with services,
are still one of the main ways to identify with a high degree of
certainty that an address belongs to a particular service. Apart
from interacting with services, [43] applied fingerprint-based
heuristics on the entire blockchain to attribute transactions
belonging to specific services (i.e., coinjoin mixers). We
cannot definitively prove that blockchain-wide fingerprints
were used for attribution, but since funds in the development
phase are rarely labeled, this suggests these type of heuristics
were not apparent in our cases, as otherwise, they would also
have been labeled. However, their work shows a different
type of attribution effort that we cannot ignore when testing
attribution.
Evaluating measurement metrics. The work from [25] is
one of the few attempts to work with ground-truth data to
validate heuristics broadly used in cryptocurrency tracing.
It, therefore, shows the pros and cons of using the change
address heuristic. However, the work has limitations as it
assumes the data from only one commercial data provider
(Chainalysis in this study) to be the ground-truth. That data is
limited in size, and this study shows that Chainalysis data
is not perfect compared to ground-truth from illicit services.
Next, the works of [22] and [10] use the graph-based nature
of UTXO blockchains for measurements heuristics. They
use it for expanding their search space or candidate set
i.e., clustering and therefore look ‘outwards’. We on
the other hand, look ‘inwards’ into a wallet and use it for
labeling rather than clustering. In a slightly different field, [5]
compared threat intelligence data from various providers.
Even though they solely focus on commercial intelligence
data, the overlap analysis is similar. Also, they interviewed
participants on their perception of how accurate the data they
use daily is. The results by [5], and especially those on the
perception of the data, vary with our work. Where they show
a higher lack of trust in data with participants, in our study,
it is less clear that our interviewees do. However, in their
case, the field is much more diverse, with many more parties
providing attribution than in blockchain intelligence.
Ground-truth data analysis. We build upon earlier work
that analyses ground-truth data seized by law enforcement
such as [13] and [27]. Both use seized data from law en-
forcement to analyze user and market behavior. We, however,
use such data access for evaluating the attribution of external
systems used by such services blockchains, to be specific.
One could argue that it is a different type of market (e.g.,
cryptocurrency attribution data vs user security practices).
Additionally, [25] uses a relatively small set of cryptocur-
rency addresses as ground truth for verification. However,
they also reason about false positives in the change address
heuristic. Yet, the number of addresses is smaller, our overall
wallet coverage is larger, and we focus on illicit services.
10 Discussion
In this section, we reflect on the implications of our findings,
discuss limitations, and explore potential future work.
Implications. We found that attribution is performed
conservatively meaning Chainalysis attributes a reliable
lower bound. This makes perfect sense when you look
at who uses these tools. Many investigators, compliance
departments, and law enforcement agencies rely on the
attribution data provided by commercial data providers in
their day-to-day activities to prevent or detect crime. In turn,
this means they are primarily interested in attributing illicit
services. We empirically evaluate attribution of illicit services
using seized ground-truth data. This is highly relevant, as
in many court cases, evidence is generated by tools these
companies provide. In many cases, the internal process
of how blockchain intelligence is generated i.e., how
attribution is achieved remains a ‘black box’ [30,33,41].
Our work provides at least partial insights into the inner
workings of that ‘black box’. It is good to know that these
tools provide estimates for tracing cryptocurrencies. In
most cases, tools are conservative and prefer not to show a
label if uncertainty is a factor. Our study shows that being
aware of this is vital. One of the interviewees stated that
he guessed (or expected) that precision was 100% before
being confronted with the preliminary results, which we
interpret as no false positives and no false negatives. Being
aware that attribution might not be perfect might help the
workflow, though visual indicators in tracing tools might also
help. Maybe together with newly developed heuristics that
indicate a change of ownership between clusters of addresses
(i.e., because different fingerprints are being used). Here,
USENIX Association 34th USENIX Security Symposium 1369
attribution focuses on forensic accuracy: stating only what
you know for certain. In the use case of law enforcement, this
is of great importance. However, the case of BestMixer shows
that at least when tracing transaction-based, they lack a large
number of flows nonetheless. In the case of calculating the
taint of an address indirectly, like in the algorithm developed
by [2], one has to take this into account.
Limitations. First, we acknowledge that having access to
data from only one blockchain intelligence company is a
limitation of our work. We pursued the inclusion of more
data sources and reached out to several commercial providers.
Through our collaboration agreement with law enforcement,
we had access to another provider. We also analyzed their
attribution. Prior to submission, we did responsible disclosure
to Chainalysis and to the second firm. Where the former
responded very positively, thanking us for the insights, the
second provider immediately responded with legal objections
and threats. In subsequent conversations, we made extensive
efforts to resolve the concerns of the second provider. This
included proposing to include its data anonymously in the
manuscript but this offer was declined. The provider said that
any disclosure, even anonymous, would be treated as grounds
for legal action. This left us with no option but to include only
Chainalysis and to explore generalization via the comparisons
with the public explorer of Arkham Intelligence.
In this work, we took snapshots of the current attribution of
Chainalysis of the illicit services where we have ground-truth
available. This means we analyzed the current state of
blockchain intelligence. We know this intelligence changes
over time both in positive and negative ways. This also
means that by looking at the current state, we do not know
when an address was first labeled and if there was a delay
in doing so. This is partially inevitable because clustering
heuristics cannot identify new deposits until they are spent.
However, we do not know the latency between when an
address was first identified by interactions versus when it was
ingested into the software assuming that interactions are
the primary way new addresses are identified. Additionally,
attribution in terms of address roles and money flow
coverage differs significantly between various types of
services. Therefore, we have to mention that these are
just three cases as this is a case study and cannot be
generalized easily since attribution relies on many factors
such as how the developers of such services decide on
the internal workings of their wallets, attribution intensity,
and maybe also how users send money to these services.
We show, by our generalization efforts using Arkham, that
service-specific clustering is the main factor influencing
attribution. Next, we find that methods such as PayJoin and
CoinJoin are of no risk of leading to false positives under the
condition that the attributed service does not use them. If they
do, then in theory it might result in false positives, but we
do not have data to support this. However, especially with
address roles, we show that certain patterns within a wallet
represent it better as a whole. Therefore, if you focus on that
pattern, you also cover most. If you interact with services and
look at which addresses you deposited or where those coins
you received back came from, it is possible to identify key
addresses of a service’s wallet. Labeling those addresses is
critical, but we might imagine this is difficult to generalize
and automate.
Future work. This research evaluates the current state of the
art in attribution but leaves various questions unanswered that
might be worth exploring. As with most studies on clustering
and labeling, we assume that we know about the existence
of a service. Only then can one interact with it (assuming
interactions are vital to gathering attribution in that case).
Our money flow evaluation technique highlights that it is
difficult to gather historic attribution. Meaning that learning
about the existence of a service early is critical. Additionally,
when interaction with services is necessary, we leave out how
this is done most effectively. In other words, what strategies
must one apply to get the best attribution? These strategies
might also depend on the type of service, for instance, address
reuse, internal asset management (i.e., does it send money
to some form of hot wallet), and the daily balance in the
wallet of that service. Identifying and testing good strategies
might also be the subject of future research. Lastly, we only
looked at the label put on an address, not how this propagation
through clustering heuristics works. This was done to scope
this research and focus on the thing most important to those
using the data. However, this leaves the area open to finding
out which clustering heuristics work well, which should be
avoided, and how to find new ones. We believe that there is
still a lot of research that can be done on this topic, given
what we saw in the data we analyzed. This is also a subject
for future research.
11 Conclusion
Leveraging insights from interviewing law enforcement pro-
fessionals who trace cryptocurrencies on a daily basis, we
evaluated illicit service attribution in this case study on three
viable cases by commercial blockchain intelligence provider
Chainalysis. We measured attribution in three ways: address
overlap, money flows, and address roles. The latter two evalua-
tions align with the types of tracing that we identified by inter-
viewing experienced law enforcement officers. We contrasted
ground-truth data against blockchain intelligence on three
seized illicit services: BestMixer, Hansa Market, and Wall
Street Market. Chainalysis underestimates the total number of
addresses of these illicit services by providing a reliable lower
bound with very few false positives. Coverage depends on
the kind of service and their wallet activity. The two darknet
markets receive better address role and money flow coverage
than the mixing service.
1370 34th USENIX Security Symposium USENIX Association
Overall, we evaluated the attribution of illicit services by
Chainalysis against ground truth data and learned about the
factors influencing attribution. We evaluated attribution in
three ways: by counting the number of addresses, measuring
the flow transaction-based, and calculating the accuracy of
the different roles of clusters within a service. Overall, we
can state that Chainalysis provides a reliable, lower bound
up to 95% with false positives being extremely rare. Yet,
coverage of BestMixer is lower than that of Wall Street Market
and Hansa Market. Last, we identified four important factors
that influence attribution: the knowledge about a service’s
existence such that one interacts with it, the centrality of the
service, address reuse, and the format of the transactions of
the service. Those influence how well the heuristics work,
and together with knowledge about the internal workings of
service, this leads to better attribution. We demonstrate that
generalization of attribution findings is possible but highly
context-dependent, requiring an understanding of both data
availability and heuristic limitations. By comparing Arkham
Intelligence to ground truth data and examining heuristic
behavior, we show that while some attribution can generalize
through domain knowledge and conservative heuristics, such
generalization must be critically assessed case by case.
Ethics considerations
In line with applicable laws and regulations, the relevant au-
thorities have seized the infrastructure of BestMixer, Hansa
Market, and Wall Street Market. Using this legally seized data
for empirical research raises certain ethical questions, which
we discuss below. While the seizures were lawful, one should
not assume that all analyzed transactions concern illegal be-
havior.
Before seized data was made accessible to us for academic
research purposes, public prosecutors weighed, among other
things, the impact of the work on the rights and privacy of
all parties. A Dutch law enforcement privacy officer vetted
our data subset to ensure that it was minimized to only data
vital to our research and contained no personally identifiable
information. We only had access to data we required for our
analyses: addresses, transaction hashes, timestamps, and con-
textual information of a deposit, such as the output index
and amount, to deduce the deposit address from a transaction
hash. Similar to earlier work on seized datasets [13,27,32,45],
all of our analyses were conducted on-site at Dutch law en-
forcement agencies, where the data was stored and protected
under their safety and security guidelines. We conferred with
our IRB beforehand. They viewed this work as outside their
jurisdiction, yet were satisfied with the assessments and pro-
cedures, outlined above, as set by the public prosecutors and
law enforcement privacy officers. Also, after the study was
completed, per our agreement with law enforcement, they
were informed to check for any further ethical considerations
according to predetermined guidelines; no further ethical con-
siderations have been raised since. The data minimization
procedure ensures that no harm was done to individuals in-
cluded in the dataset: no personally identifiable information,
such as usernames, was used in this study. Please note that
bitcoin address and wallet information was already publicly
available, and no attribution data i.e., bitcoin wallet, cluster,
or address information of any of the seized services is pub-
lished in this paper. Last, as part of our responsible disclosure,
we informed Chainalysis early on about our findings. That is,
we provided them with addresses labeled as an illicit service,
which were not (false positives), and the reason we believe
this occurred.
Open science policy
We strongly support the use open data. Yet, access to data used
in this paper is restricted in terms of a) licensing with regard
to commercial data, and b) seized data is protected under
criminal law, where only authorized access for designated
researchers exists. Therefore, it is not legally possible to make
the data in this paper public. However, given the public nature
of the blockchain, a wealth of raw transaction data is available
and has fueled academic work into illicit services in the past
years.
References
[1]
Yara Abdel Samad. Case study: Dark web markets. Dark Web Investi-
gation, pages 237–247, 2021.
[2]
Ross Anderson. Making bitcoin legal (transcript of discussion). In
Security Protocols XXVI: 26th International Workshop, Cambridge,
UK, March 19–21, 2018, Revised Selected Papers 26, pages 254–265.
Springer, 2018.
[3]
Andreas M Antonopoulos. Mastering Bitcoin: unlocking digital cryp-
tocurrencies. " O’Reilly Media, Inc.", 2014.
[4]
Rainer Böhme, Nicolas Christin, Benjamin Edelman, and Tyler Moore.
Bitcoin: Economics, technology, and governance. Journal of economic
Perspectives, 29(2):213–238, 2015.
[5]
Xander Bouwman, Harm Griffioen, Jelle Egbers, Christian Doerr, Bram
Klievink, and Michel Van Eeten. A different cup of
{
TI
}
? the added
value of commercial threat intelligence. In 29th USENIX security
symposium (USENIX security 20), pages 433–450, 2020.
[6]
Roderic Broadhurst, David Lord, Donald Maxim, Hannah Woodford-
Smith, Corey Johnston, Ho Woon Chung, Samara Carroll, Harshit
Trivedi, and Bianca Sabol. Malware trends on ‘darknet’crypto-markets:
Research review. Available at SSRN 3226758, 2018.
[7]
Lars Brünjes and Murdoch J Gabbay. Utxo-vs account-based smart
contract blockchain programming paradigms. In Leveraging Applica-
tions of Formal Methods, Verification and Validation: Applications: 9th
International Symposium on Leveraging Applications of Formal Meth-
ods, ISoLA 2020, Rhodes, Greece, October 20–30, 2020, Proceedings,
Part III 9, pages 73–88. Springer, 2020.
[8]
Chainalysis. 270 service deposit addresses drive 55cryptocurrency.
https://www.chainalysis.com/blog/cryptocurrency-money
-laundering- 2021/, Feb 2021.
[9]
Chainalysis Team. Is bitcoin traceable?
https://blog.chainalys
is.com/reports/is-bitcoin- traceable/, 2022.
USENIX Association 34th USENIX Security Symposium 1371
[10]
Ling Cheng, Feida Zhu, Yong Wang, Ruicheng Liang, and Huiwen
Liu. Evolve path tracer: Early detection of malicious addresses in
cryptocurrency. In Proceedings of the 29th ACM SIGKDD Conference
on Knowledge Discovery and Data Mining, pages 3889–3900, 2023.
[11]
Nicolas Christin. Traveling the silk road: A measurement analysis of
a large anonymous online marketplace. In Proceedings of the 22nd
international conference on World Wide Web, pages 213–224, 2013.
[12]
Nicolas Christin. Measuring and analyzing online anonymous (’dark-
net’) marketplaces. CARNEGIE-MELLON UNIV PITTSBURGH PA,
2022.
[13]
Alejandro Cuevas, Fieke Miedema, Kyle Soska, Nicolas Christin, and
Rolf van Wegberg. Measurement by proxy: On the accuracy of online
marketplace measurements. In 31st USENIX Security Symposium
(USENIX Security 22), pages 2153–2170, 2022.
[14]
Department of Justice. Ohio resident pleads guilty to operat-
ing darknet-based bitcoin mixer that laundered over 300 million.
https://www.justice.gov/opa/pr/ohio-resident-pleads-guilty-operating-
darknet-based-bitcoin-mixer-laundered-over-300-million, Aug
2021.
[15]
Dmitry Ermilov, Maxim Panov, and Yury Yanovich. Automatic bitcoin
address clustering. In 2017 16th IEEE International Conference on
Machine Learning and Applications (ICMLA), pages 461–466. IEEE,
2017.
[16]
Europol. International sting against dark web vendors leads to 179
arrests.
https://www.europol.europa.eu/media-press/newsr
oom/news/international-sting-against-dark-web-vendors
-leads- to-179-arrests, Sep 2020.
[17]
Europol. 150 arrested in dark web drug bust as police seize C26 million.
https://www.europol.europa.eu/media-press/newsroom/ne
ws/150-arrested-in-dark-web-drug-bust-police-seize-%
E2%82%AC26-million, Oct 2021.
[18]
Europol. One of the darkweb’s largest cryptocurrency laundromats
washed out.
https://www.europol.europa.eu/media-press/n
ewsroom/news/one-of-darkwebs-largest-cryptocurrency-l
aundromats-washed- out, Mar 2023.
[19]
FinCEN. First bitcoin mixer penalized by fincen for violating
anti-money laundering laws. https://www.fincen.gov/news/news-
releases/first-bitcoin-mixer-penalized-fincen-violating-anti-money-
laundering-laws, Oct 2020.
[20]
FIOD. The fiod and the public prosecution service take money launder-
ing machine for cryptocurrencies offline. https://www.fiod.nl/the-fiod-
and-the-public-prosecution-service-take-money-laundering-machine-
for-cryptocurrencies-offline/, May 2019.
[21]
FIOD. Arrest of suspected developer of tornado cash.
https://www.
fiod.nl/arrest-of-suspected-developer-of-tornado-cas
h/, Aug 2022.
[22]
Gibran Gomez, Pedro Moreno-Sanchez, and Juan Caballero. Watch
your back: identifying cybercrime financial relationships in bitcoin
through back-and-forth exploration. In Proceedings of the 2022 ACM
SIGSAC conference on computer and communications security, pages
1291–1305, 2022.
[23]
Martin Harrigan and Christoph Fretter. The unreasonable effec-
tiveness of address clustering. In 2016 intl ieee conferences
on ubiquitous intelligence & computing, advanced and trusted
computing, scalable computing and communications, cloud and
big data computing, internet of people, and smart world congress
(uic/atc/scalcom/cbdcom/iop/smartworld), pages 368–373. IEEE, 2016.
[24]
Harry Kalodner, Malte Möser, Kevin Lee, Steven Goldfeder, Martin
Plattner, Alishah Chator, and Arvind Narayanan.
{
BlockSci
}
: Design
and applications of a blockchain analysis platform. In 29th USENIX
Security Symposium (USENIX Security 20), pages 2721–2738, 2020.
[25]
George Kappos, Haaroon Yousaf, Rainer Stütz, Sofia Rollet, Bernhard
Haslhofer, and Sarah Meiklejohn. How to peel a million: Validating
and expanding bitcoin clusters. In 31st USENIX Security Symposium
(USENIX Security 22), pages 2207–2223, 2022.
[26]
Sarah Meiklejohn, Marjori Pomarole, Grant Jordan, Kirill Levchenko,
Damon McCoy, Geoffrey M Voelker, and Stefan Savage. A fistful of
bitcoins: characterizing payments among men with no names. In Pro-
ceedings of the 2013 conference on Internet measurement conference,
pages 127–140, 2013.
[27]
Fieke Miedema, Kelvin Lubbertsen, Verena Schrama, and Rolf van
Wegberg. Mixed signals: Analyzing
{
Ground-Truth
}
data on the users
and economics of a bitcoin mixing service. In 32nd USENIX Security
Symposium (USENIX Security 23), pages 751–768, 2023.
[28]
Malte Möser and Arvind Narayanan. Resurrecting address clustering
in bitcoin. In International Conference on Financial Cryptography and
Data Security, pages 386–403. Springer, 2022.
[29]
Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system.
Decentralized business review, page 21260, 2008.
[30]
Lily Hay Newman and Andy Greenberg. Bitcoin fog case could put
cryptocurrency tracing on trial.
https://www.wired.com/story/
bitcoin-fog-roman-sterlingov-blockchain-analysis/
, Aug
2022.
[31]
Jonas David Nick. Data-driven de-anonymization in bitcoin. Master’s
thesis, ETH-Zürich, 2015.
[32]
A. Noroozian, J. Koenders, E. van Veldhuizen, C.H. Ganan, S. Alrwais,
D. McCoy, and M. van Eeten. Platforms in everything: analyzing
ground-truth data on the anatomy and economics of bullet-proof host-
ing. In USENIX Security 19), pages 1341–1356, 2019.
[33]
Jan-Jaap Oerlemans, KMT Helwegen, et al. Annotatie hof den haag 1
februari 2022, ecli: Nl: Ghdha: 2022: 104. Computerrecht, 2022.
[34]
Department of Justice. Alphabay, the largest online “dark market, shut
down.
https://www.justice.gov/opa/pr/alphabay-largest- o
nline-dark- market-shut-down, Jul 2017.
[35]
Department of Justice. Binance and ceo plead guilty to federal charges
in $4b resolution. Nov 2023.
[36]
Office of Foreign Asset Control. U.s. treasury issues first-ever sanctions
on a virtual currency mixer, targets dprk cyber threats.
https://home
.treasury.gov/news/press-releases/jy0768, May 2022.
[37]
Office of Foreign Asset Control. U.s. treasury sanctions notorious
virtual currency mixer tornado cash.
https://home.treasury.gov/
news/press-releases/jy0916, Aug 2022.
[38]
Marek Palatinus and Pavol Rusnak. Multi-account hierarchy for deter-
ministic wallets.
https://github.com/bitcoin/bips/blob/mas
ter/bip-0044.mediawiki, Apr 2014.
[39]
Fergal Reid and Martin Harrigan. An analysis of anonymity in the
bitcoin system. Springer, 2013.
[40]
Jesus Rodriguez. 10 patterns of centralized crypto exchanges explained
using machine learning and data visualizations.
https://medium.c
om/intotheblock/10-patterns-of-centralized-crypto-exc
hanges-explained-using-machine-learning-and-data-b38
6d913832, Oct 2019.
[41]
Jack Schickler. Tornado cash dev facing dutch charges
to question chainalysis data alleging criminal links.
https://www.coindesk.com/policy/2023/05/24/tornado-cash-dev-
facing-dutch-charges-to-question-chainalysis-data-alleging-criminal-
links/, May 2023.
[42]
Kyle Soska and Nicolas Christin. Measuring the longitudinal evolution
of the online anonymous marketplace ecosystem. In 24th USENIX
security symposium (USENIX security 15), pages 33–48, 2015.
1372 34th USENIX Security Symposium USENIX Association
[43]
Rainer Stütz, Johann Stockinger, Pedro Moreno-Sanchez, Bernhard
Haslhofer, and Matteo Maffei. Adoption and actual privacy of decen-
tralized coinjoin implementations in bitcoin. In Proceedings of the
4th ACM Conference on Advances in Financial Technologies, pages
254–267, 2022.
[44]
Jacob Swambo, Spencer Hommel, Bob McElrath, and Bryan Bishop.
Bitcoin covenants: Three ways to control the future. arXiv preprint
arXiv:2006.16714, 2020.
[45]
Jochem van de Laarschot and Rolf van Wegberg. Risky business?
investigating the security practices of vendors on an online anonymous
market using ground-truth data. In USENIX Security Symposium, pages
4079–4095, 2021.
[46]
Rolf van Wegberg, Samaneh Tajalizadehkhoob, Kyle Soska, Ugur
Akyazi, Carlos Hernandez Ganan, Bram Klievink, Nicolas Christin,
and Michel Van Eeten. Plug and prey? measuring the commoditization
of cybercrime via online anonymous markets. In 27th USENIX security
symposium (USENIX security 18), pages 1009–1026, 2018.
[47]
Rolf van Wegberg and Thijmen Verburgh. Lost in the dream? measuring
the effects of operation bayonet on vendors migrating to dream market.
In Proceedings of the Evolution of the Darknet Workshop, volume 9,
2018.
A. Interview Protocol
The interviews were conducted using the following interview
protocol. These questions were asked in a semi-structured
setting with the interviewee, so the order in which they were
asked might vary between participants, depending on the di-
rection the conversation went.
Background interviewee
What is your role during an investigation?
What crypto-related training did you follow?
How do you keep your knowledge on tracing up to
date?
Tracing
What tools do you use for tracing (i.e., commercial
and non-commercial, block explorers, etc.)?
What is the starting point based on which you start
tracing, and what is the goal / final product?
Can you describe how you trace money flows?
Do you know the source of information of attribu-
tion of the tools you use?
Did you ever look at your own wallets/transactions
in the tools? And what did you find?
Have you ever checked or revisited your analysis
after a seizure? And what did you find?
How confident do you feel about the accuracy of
the tools you use? / What are your experiences
with the accuracy of the tools you use? (separated
per category of service. Asked about mixing ser-
vices, exchanges, ransomware, and darknet market
clusters)
Preliminary results (we showed an early version of Fig-
ure 1, showing one service at a time, starting with Best-
Mixer)
Firstly: what do you expect the results to be? (un-
derestimation/overestimation and by how much)
Then: showed the results
Per seized service: what do you get from these
results?
How does this influence your day-to-day work
when tracing?
B. Interview Codebook
Code Group Code
Interviewee Description: This category
Characteristics captures the interviewee’s role,
tracing background, and approach
to staying, updated on
cryptocurrency tracing, developments.
It reflects their level of experience
and proactive engagement with
crypto-related tasks.
Role Detective, Analyst
Crypto Training Self-taught, Vendor-specific training,
External training
Crypto Updates News, Podcasts, Colleagues
Proactiveness Self-motivated, Organization-mandated
Tracing Tools and Description: This category includes the
Techniques types of tools (both commercial and
public) and techniques used by
interviewees in tracing cryptocurrency
transactions. It reflects the resources
available to them and their tracing
methodologies.
Commercial Tracing Tools Chainalysis, TRM
Public Tools Block explorers, Breadcrumbs
Tracing Techniques Transaction-based analysis,
Relationship-based analysis
Investigation Goals and Description: This section captures the
Approaches objectives and methodologies guiding
the interviewees’ investigations, as well
as their awareness of the data sources
that support attribution in tracing tools.
Investigation Goals Identification, Turnover analysis
Knowledge of Attribution OSINT, Self-attribution of tools,
Sources Customer data
Revisiting Analysis Always, Sometimes, Never
USENIX Association 34th USENIX Security Symposium 1373
Personal Transactions Description: This section records
whether the interviewees have checked
their own transactions using tracing tools,
and if so, whether these were commercial
or public transactions.
Use of Own Transactions Yes, commercial; Yes, public; No
Implication of Description: This category explores how
Results tracing results influence the interviewees’
daily work, including changes in workflow,
increased oversight, or relocation of
resources based on tracing insights
Influence on Day-to-Day Workflow adjustment, Increased oversight
Work Resource allocation
C. Code Saturation
123456
Interview number (in order)
0
2
4
6
8
10
12
14
16
New codes
1374 34th USENIX Security Symposium USENIX Association