Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF Free Download

Name: Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF
Author: Jared S. Smith

1 / 19

3 views•19 pages

Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF Free Download

Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF free Download. Think more deeply and widely.

This paper is included in the Proceedings of the

34th USENIX Security Symposium.

August 13–15, 2025 • Seattle, WA, USA

978-1-939133-52-6

Open access to the Proceedings of the

34th USENIX Security Symposium is sponsored by USENIX.

Ghost Clusters: Evaluating Attribution of

Illicit Services through Cryptocurrency Tracing

Kelvin Lubbertsen, Michel van Eeten, and

Rolf van Wegberg, Delft University of Technology

https://www.usenix.org/conference/usenixsecurity25/presentation/lubbertsen

Ghost Clusters: Evaluating Attribution of Illicit Services

through Cryptocurrency Tracing

Kelvin Lubbertsen, Michel van Eeten, and Rolf van Wegberg

Delft University of Technology

Abstract

One of the principles in cryptocurrency tracing is putting a

name to an address – a process called attribution. Attribution

is key for both law enforcement and compliance professionals.

Blockchain intelligence companies sell attribution as a service

by leveraging pseudonymous blockchains, clustering heuris-

tics, and labeling of addresses. In this paper, we perform a

case study on Chainalysis, the market leader, and evaluate its

attribution by comparing it against ground-truth data on three

seized illicit services – BestMixer, Hansa Market, and Wall

Street Market. To design the evaluation, we interview front-

line law enforcement professionals and learn how they trace

cryptocurrencies using blockchain intelligence providers. We

identify three evaluation techniques – i.e., address overlap,

money ﬂows, and address roles – that realistically measure at-

tribution in line with law enforcement use cases. Using these

techniques, we show that for our three illicit services, Chainal-

ysis provides a reliable lower bound (

24.54

94.85

percent

accurate), and produces very few false positives (less than

0.5

percent). Also, we ﬁnd that coverage changes over time.

We reason about factors that inﬂuence attribution and demon-

strate the importance of attributing certain key addresses to

achieve high coverage, and with that, show that when includ-

ing a second blockchain intelligence provider, the difﬁculties

in generalizing results.

1 Introduction

Follow-the-money has long been the cornerstone of many

law enforcement investigations. With the adoption of cryp-

tocurrencies in cybercrimes – from ransomware payments

to buying drugs on so-called ‘darknet markets’ and paying

for bulletproof hosting – efforts to police these crimes have

become reliant on the ability to trace cryptocurrency trans-

actions. Although most cryptocurrencies are pseudonymous

– which means transactions and identiﬁers are transparently

stored in blockchains – it remains challenging to link real-

world identities to wallets, addresses, and services. This pro-

cess of linking blockchain activities to real-world identities

is called ‘attribution’ [26]. Attribution is a critical step in

tracing illicit money ﬂows, as done by law enforcement in-

vestigators, private investigators, and compliance teams at

regulated crypto-asset service providers.

Almost a decade after the ﬁrst attempts to deanonymize

cryptocurrency money ﬂows [39], an industry of commercial

data providers has emerged specializing in the attribution of

services, wallets, and addresses on blockchains. A few big

commercial players in this ﬁeld are Chainalysis, Elliptic, and

TRM Labs – where Chainalysis is the market leader. All these

companies provide data and tools that allow law enforce-

ment, private investigators, and compliance teams to trace

illicit money ﬂows. The dependence on such ‘attribution-as-

a-service’ providers means the service must be reliable [33].

Law enforcement conducts criminal investigations based on

the attribution of addresses. Major cases, such as the takedown

of the then largest darknet market AlphaBay [34], depended

heavily on using these methods. Also, in the private sector,

virtual asset services such as cryptocurrency exchanges use

tools that rely heavily on attribution. Regulation to prevent

money laundering and terrorist ﬁnancing requires some form

of customer screening, including the source of funds. On-

chain transaction monitoring is a vital component, and the

same kind of attribution data is used for it. They are used to

ﬁle suspicious activity reports at exchanges, and failing to

have this in place can have serious consequences, as can be

seen with, for instance, Binance, which was ﬁned by the US

authorities in 2023 [35].

However, illicit services are likely to resist attribution by

their nature: they, by deﬁnition, do not cooperate with regu-

lators and sometimes even advertise their evasion of AML

(anti-money laundering) policies enforced by tools such as

Chainalysis – as Bestmixer did. This makes validation dif-

ﬁcult if not impossible. Commercial intelligence providers

cannot access large-scale ground-truth data of illicit services,

wallets, and addresses to validate their methods. Hence, they

often rely on direct interactions with the service – e.g., initiat-

ing a transaction with a Bitcoin mixer – to generate a sample

of illicit service addresses. These are then expanded upon via

USENIX Association 34th USENIX Security Symposium 1357

algorithms that make use of state-of-the-art heuristics, like

co-spending [26]. Still, precision is critical when the stakes

are as high as they are in, for instance, law enforcement. Inac-

curate labeling of a service as illicit could derail a prosecution

or, in the worst case, even lead to a wrongful conviction.

Obtaining ground truth data from illicit services, would

help in assessing attribution accuracy. Yet, acquiring such data

is hardly ever possible. Only when law enforcement seizes

an illicit service and the wallet data on the seized servers re-

mains intact, does this enable large-scale evaluation. Through

a unique collaboration with Dutch law enforcement, we man-

aged to ﬁnd three of those rare cases where wallet data was

largely intact and we were granted permission to use this data

for research purposes.

To evaluate attribution, we interview front-line profession-

als in law enforcement about how they use ‘attribution-as-a-

service’ and what value they attach to blockchain intelligence

in investigations. This, when mapped against the design of

cryptocurrencies such as Bitcoin [29] and the internal wal-

let architecture as described by [40], yielded three methods

for evaluating attribution: address overlap, money ﬂows, and

address roles. The ﬁrst is a generic evaluation of attribution

and provides insights into coverage of an entire illicit service

wallet – e.g., a darknet market or a mixing service wallet. The

latter two correspond with the ways we found that profession-

als trace cryptocurrencies in investigations. To evaluate attri-

bution, we leverage seized data from recent (2017 – 2019) law

enforcement take-downs of three known illicit services: Best-

Mixer, Hansa Market, and Wall Street Market. This allows

us to investigate the attribution provided by market leader in

blockchain intelligence – Chainalysis – and unravel factors

inﬂuencing attribution, which are relevant beyond just these

three cases. We make the following contributions:

•

We evaluate attribution by commercial blockchain intel-

ligence market leader Chainalysis on three illicit services

and ﬁnd their attribution to be a reliable, lower bound –

of up to 95% of addresses in seized wallets – with false

positives being rare (<0.5 percent).

•

We reﬂect on this ﬁnding by engaging with law en-

forcement professionals and learn that they anticipate

conservative attribution, whilst discovering they employ

two distinct tracing strategies – transaction-based and

relation-based tracing respectively – we can use for mea-

suring attribution accuracy.

•

We investigate address and cluster-based attribution us-

ing their role within a service wallet – like relation-based

tracing done by law enforcement – and demonstrate how

attribution changes based on blockchain intelligence tac-

tics, and ﬁnd that wallets of services for a large part

depend on one type of behavior or role.

•

We use the directed acyclic graph design of UTXO-based

blockchains to evaluate attribution using a technique we

call money ﬂows, allowing us to evaluate transaction-

based tracing and ﬁnd that attribution accuracy is depen-

dent on time: the earlier the address lived in the illicit

service lifespan, the more likely it was not attributed.

•

We contrast our ﬁndings against a second, public

blockchain intelligence provider, Arkham Intelligence,

and show that attribution hinges on clustering highly

service-speciﬁc components such as the escrow service

of a darknet market.

The remainder of this paper is structured as follows. First,

we describe the nexus of cryptocurrencies and crime in Sec-

tion 2. Then, we describe our methodology and provide data

descriptions in Section 3. We interviewed law enforcement

ofﬁcers to learn how they trace cryptocurrencies and their

perceptions on attribution in Section 4. Leveraging the results

from this, we build three evaluation techniques throughout

the core of this paper. We discuss the ﬁrst evaluation, we

call ‘address overlap’, in Section 5. We use well-known de-

sign features of cryptocurrencies to evaluate these two tracing

strategies. We refer to these evaluations as ‘money ﬂows’

and ‘address roles,’ respectively, for which we discuss the

results in Sections 6and 7respectively. Then, we will attempt

to generalize these results across data providers in Section

8and map those results to the practice of law enforcement

tracing, as that goes further than the three datasets this study

evaluates using ground truth data. We contrast with related

work (Section 9) and discuss limitations and future work (Sec-

tions 10). Section 11 concludes. Finally, we discuss ethical

considerations at the end of paper.

2 Crypto & Crime

Bitcoin [29] was the ﬁrst blockchain-based cryptocurrency.

Its inception in 2009 instigated a new economy of services to

buy, sell, spend, and trade cryptocurrencies. Cryptocurrencies

have also generated an entire ecosystem of businesses, such

as payment processors and cryptocurrency exchanges, which

often function as the off- and on-ramps to the traditional ﬁ-

nancial system [4]. Blockchain technology allows people to

remain relatively anonymous, in this case, pseudonymous.

This presumed pseudonymity has sparked the use of cryp-

tocurrencies for various criminal activities – transacting on

underground markets and ransomware payments. On the other

hand, law enforcement and private investigators use the same

pseudonymous nature to trace cryptocurrencies. For them, two

concepts are vital: attribution (i.e., what name is linked to this

pseudonymous cryptocurrency address) and clustering (i.e.,

which addresses belong to the same actor). Cryptocurrency

tracing is focused on these aspects. This ﬁeld takes advantage

of the design principles of these currencies to combine and

attribute groups of addresses. For this, it is essential to note

that bitcoin and others are cryptocurrencies in which wallets

1358 34th USENIX Security Symposium USENIX Association

have one or more pairs of keys where money is deposited on a

(derivative of a) public key (i.e., the cryptocurrency address).

One spends money by signing a transaction with a private

key [3,29]. Every transfer is denoted in a transaction, and

they are considered correct if they have valid signatures and

are included in a block. Transactions consist of one or more

inputs and deﬁne one or more outputs. Coins that use such a

system of multiple inputs that refer back to other outputs are

sometimes called UTXO-based cryptocurrencies (‘UTXO’

stands for unspent transaction output) [7]. Every transaction

input links back to another transaction’s output to prevent

double-spending. The UTXO model of cryptocurrencies, such

as Bitcoin, sometimes requires multiple inputs in the trans-

action to meet the amount sent. For such a transaction to be

agreed upon, multiple UTXOs (and most likely also multi-

ple addresses) must sign the transaction, of which one can

assume common ownership. This is called the co-spending

heuristic [26]. One can create so-called ‘clusters’ of addresses

by applying co-spending to addresses.

The other primary type of clustering is what is often re-

ferred to as the change address heuristic. The output often

includes a change address because one can only spend a full

UTXO, which rarely matches the amount one wants to pay.

The remaining amount is returned to the sender’s wallet in

that output. Various heuristics exist, all applying the scenario

where there could be change – i.e., a transaction consisting of

two outputs: one payment and one change, and two outputs

only [15]). Commonly an output is considered a change if it

is the ﬁrst time this address is seen on the blockchain, or it is

a self-change – i.e., the output address is also an input [26].

Variations also exist that state that no unnecessary inputs are

added to the transaction [31] and the exact address types and

features are used [24,28].

Two prominent types of platforms where cryptocurren-

cies, crime, and attribution attempts converge are darknet

markets and bitcoin mixers. Mixing services operate in a

legal gray area: in some jurisdictions, such as the US, cen-

tralized mixing services are under pressure from regulators

and law enforcement. For example, the operator of Helix

Mixer pleaded guilty to money laundering [14] and was ﬁned

by FinCEN [19], whereas Chipmixer.io has been seized by

law enforcement [18,20]. Lately, mixing services have been

sanctioned, such as Blender.io and Tornado Cash [21,36,37].

Darknet markets are platforms where illicit goods and services

are transacted in cryptocurrencies [11]. These marketplaces,

of which some of the most prevalent have been Silk Road,

AlphaBay, Hansa Market, and Hydra Market [12], are used to

transact illegal goods and services [6,42,46]. Given that its

use case has primarily been facilitating crime since the incep-

tion of the ﬁrst darknet market (Silk Road), law enforcement

has pursued these markets [47]. Seizing such services pro-

vides law enforcement with a data gold mine based on which

law enforcement can pursue users [16,17]. But this data also

gives unique insights into the inner workings of darknet mar-

kets [13,45]. Here, we leverage two seized darknet markets

(i.e., Hansa Market and Wall Street Market) and one bitcoin

mixer (i.e., BestMixer) to investigate how they are attributed

in commercial blockchain intelligence.

3 Methodology

Our study consists of two phases. First, we begin with

conducting interviews to understand how professionals use

and rely on ‘attribution-as-a-service’ in their investigations.

Second, we conduct a case study wherein we evaluate the

attribution of seized data from three known illicit services

and blockchain intelligence on these services provided by

Chainalysis. We will describe these data sources in more

detail in this section, validate their completeness, and provide

data descriptives. Last, we discuss the ethics of using seized

data in our work.

Interviews. We engaged with cryptocurrency tracing spe-

cialists in law enforcement to learn how they use and what

value they attach to blockchain intelligence in investigations.

Here, we opt for semi-structured interviews.

Our goal for these interviews is two-fold. First, we want

to learn how people use tracing tools and the attribution they

provide to align our measurement methodology with actual

use. Second, we used this opportunity to learn about the trust

participants place in the attribution provided via these tools.

Through our law enforcement network, we learned that

within Europe, there are between 25 and 50 leading experts

involved in mostly high-proﬁle cases and, therefore, aware of

the latest developments. We believe that these specialists are

of most use to this study as they are more likely to not only

see examples from their own cases but could also be asked

to reﬂect on the investigative process as a whole. We get to

this population size of 20 to 50 experts via regular partici-

pation at the leading European law enforcement conference

on cryptocurrency tracing, one of the two worldwide leading

cryptocurrency conferences for law enforcement. Not all law

enforcement agencies allow participation in scientiﬁc studies,

even though we tried contacting specialists at these agencies.

That led to a pool of about 12 specialists we could contact and

be allowed to participate. Six of those were able and willing to

be interviewed (see Table 1). All of them worked in different

# Role Relevant years in service

1 Detective 4

2 Analyst 3

3 Analyst/Detective 3

4 Analyst 3

5 Detective 4

6 Detective 7

Table 1: Interviewees and their background

USENIX Association 34th USENIX Security Symposium 1359

units and teams at various law enforcement agencies, although

they might know each other and have shared best practices.

While this may seem a small sample, these respondents make

up a signiﬁcant proportion of the total population of 25-50

specialists. All of them were categorized as detectives or ana-

lysts, where the role of analysts differs from detectives as they

are not responsible for the entire case. R-3 started as analyst

and later became detective.

All specialists (

n=6

) were asked how they trace cryp-

tocurrency money ﬂows to help contextualize our results. We

opted to interview law enforcement ofﬁcers rather than, for

instance, industry analysts because of LEA’s ability to sub-

poena regulated services. This is relevant here, as it provides

a feedback loop that might be lacking for analysts who do

not have that ability and allows them to discover errors in

attribution more readily. Logically, this would imply that law

enforcement professionals receive feedback on attribution

more often. Therefore, they represent a more relevant popula-

tion to interview.

The interviews were conducted using a predeﬁned protocol

(see Appendix A). The interviews consisted of two phases: a

ﬁrst phase in which general tracing methods were discussed,

and a second phase in which we asked our participants to

reﬂect on the preliminary ﬁndings of the case study on the

three data sets. All interviews were recorded (with consent

from the interviewees) and transcribed by the lead researcher.

Coding was done using qualitative data analysis software

(Atlas.ti). First, themes were inductively selected, leveraging

the interview protocol to identify initial themes. Second, de-

ductive coding identiﬁed other themes that aligned with the

study and provided more detailed insights into respondents’

perspectives. All themes were reviewed by checking coher-

ence within coded data extracts and then evaluated against the

entire dataset. During this phase, some themes were merged,

split, or discarded. Cohen’s Kappa was not applied in this

thematic analysis because the coding process was primar-

ily interpretive and iterative, rather than a rigid application

of a predeﬁned codebook suitable for statistical agreement

measures.

Emerging themes—aligned with the structure of our

interview protocol—were coded inductively, and we reached

thematic saturation after the six interviews were completed

(see Appendix Bfor the codebook). Saturation occurred early

in the process (see Appendix Cfor the saturation plot); after

four interviews, almost no new codes were discovered.

Data sources. To evaluate attribution, we use cryptocur-

rency addresses as a basis. This means we build a data

model with addresses as a common identiﬁer between

datasets. From Chainalysis, where we have access to based

on our collaboration with law enforcement, we collect

their intelligence on all addresses of the three included

illicit services: BestMixer, Hansa Market and Wall Street

Market. Chainalysis provides tools to trace cryptocurrencies

and compliance solutions for ﬁnancial institutions and law

enforcement. Chainalysis identiﬁes labeling strategies using

“manual and automated techniques" [9]. This collection was

done by simply downloading all cryptocurrency addresses

of these three services from this tool using the built-in

download button. Additionally, for generalization purposes,

we also retrieved data from a second blockchain intelligence

provider: Arkham Intelligence. Also here, we collected all

addresses present in their tool. We will use their data to

reason about the generalization of the results. We used a

full Bitcoin node, imported all addresses into the wallet,

and rescanned the blockchain. We then downloaded all the

transactions that belong to these addresses. For BestMixer,

we were given access to the master key, which we used

to derive all addresses until there was a signiﬁcant gap of

unused addresses. We validated this with the internal wallet

database, which we had access to. For the seized darknet

markets, capturing cryptocurrency addresses was a little

bit more complex given the format in which the data was

provided to us. We were given access to relevant back-end

database tables – and of those tables, only relevant ﬁelds

containing cryptocurrency transactions. We used the data

in those tables to identify the addresses on the blockchain

only if a transaction and/or output index were provided. We

used a limited dataset, meaning just the addresses or other

information, such as transaction hashes and output indices, to

deduce the full wallet. When things were unclear, we could

confer with involved investigators.

Data validation. For BestMixer, we had access to the master

key and, for veriﬁcation purposes, the wallet database. With

the master key of BestMixer and the fact that this was a hi-

erarchical deterministic wallet, we could safely state that we

had 100 percent coverage. We reverse-engineered the method

to derive addresses from the master key, which used an in-

dex starting at 0 and incremented for each newly generated

address. Using this method, we generated addresses starting

from index 0 for the ﬁrst address until the last 2,500 addresses

that did not exist on the blockchain. To see if the arbitrarily

chosen amount of 2,500 addresses was not too small, we cal-

culated the maximum gap between two existing addresses

before that. This gap was 217, signiﬁcantly more than the

20 speciﬁed by [38], but also way smaller than the arbitrar-

ily chosen number of 2,500. The only address we could not

generate from the master key was a vanity address used for

donation purposes and publicly advertised. This address ap-

pears to be imported, since a vanity address generator most

likely generated it, and we added it manually to our dataset.

For both Wall Street Market and Hansa Market, we did not

have access to the master key, but instead used the back-end

databases. Here, the relevant tables used auto-increment in

the identiﬁer, which we used to measure the completeness.

Hansa Market had four relevant tables for us: ranging in

completeness between 97.00 and 99.58 percent. For Wall

1360 34th USENIX Security Symposium USENIX Association

Street Market, we also used four tables. Completeness ranged

from 99.19 to 100.00 percent.

Data descriptives. Our data model of seized illicit service

addresses consists of 182,612 BestMixer addresses, 1,157,519

Hansa Market addresses, and 553,777 Wall Street Market

addresses that have been used on the blockchain form the

basis of our data. The seized services also include addresses

generated but to which the user, for instance, never sent funds.

We exclude those because they could have only been known

to the service itself. With the number of counted addresses,

we mean the number of addresses used in the ground truth.

Blockchain intelligence by Chainalysis resulted in 44,840

addresses for BestMixer and 914,866 and 525,847 for Hansa

Market and Wall Street Market respectively. In Section 5, we

speak to attribution by ﬁrst analyzing address overlap.

Approach. First, we conducted interviews using the proto-

col described earlier. Looking forward slightly to the results,

it follows from these interviews that apart from the evalu-

ation metric ‘address overlap’, we can compose two more

representing different tracing types (‘money ﬂows’ and ‘ad-

dress roles’), which represent transaction-based tracing and

relationship-based tracing. We use these methods to reason

about attribution more representative of the real world. Vari-

ous factors might inﬂuence its use cases, which have not been

empirically evaluated yet. This will be the ﬁrst step, after

which we can look at the real-world impact.

4 Law Enforcement Perspectives

Before we evaluate the attribution by Chainalysis, it is

important to understand how law enforcement professionals

expect such blockchain intelligence providers perform and

what use cases they have for it. We employ semi-structured

interviews to engage with law enforcement professionals

to understand how they use blockchain intelligence and

investigate how they perceive their attribution. That way,

our evaluation ﬁts with real-world use and expectations.

Through our law enforcement network, we contacted respon-

dents (

n=6

) who have been part of various high-impact

multi-national cases over the past years and work for

European law enforcement agencies (see Section 3). All have

worked in law enforcement for several years, specializing in

cryptocurrency tracing, and have applied those skills in many

signiﬁcant cases. The full interview protocol is attached in Ap-

pendix 11. We use R-1 to R-6 to refer to speciﬁc interviewees.

Tracing methods. After learning about the background of

the interviewees, we ﬁrst asked them how they used the tools

and how they perceived the quality of attribution in the tools

they had available to them. We learn that all participants

have proactively started specializing in cryptocurrency trac-

ing. Apart from one, they all received some form of formal

training, including training from the blockchain intelligence

provider itself and a dedicated training institute. About this

training, R-2 said: “I did it so that I could use it for refer-

ence [in the reports I write] as some form of certiﬁcation

[in court]”. This points out why all organizations organize

some form of training. There is a need to train law enforce-

ment ofﬁcers to handle these complex tools. Both R-2 and

R-5 pointed out the importance of certiﬁcation of this type of

work in a law enforcement environment. In addition to train-

ing, three respondents mentioned using informal networks

of law enforcement ofﬁcers for knowledge gathering as an

alternative to training courses. And two respondents men-

tioned following cryptocurrency-related publications to stay

up-to-date with the latest developments.

All used at least one commercial tracing tool, half of them

using two. All used at least Chainalysis software, highlighting

their position as market leaders in the ﬁeld. Yet, all expressed

the need for at least two tools, deﬁning an apparent demand for

a second opinion, with one participant saying: “Trust is good,

checking is better’ (R-2). Or, as R-1 mentioned for a crimi-

nal trial when he temporarily had two tools at his disposal:

“You can sometimes see clear differences [when comparing

information from your case].’ R-1 did not specify this further,

leaving it open to whether the visualization or the underly-

ing data was the reason for the difference. The interviewees

who had access to one blockchain intelligence provider men-

tioned the high cost as the cause. One notable mention is a

respondent (R-1) who used only one tool, but compensated

for that by using internal intelligence sources such as seized

datasets. But even though respondents had access to com-

mercial tracing tools, they also regularly used public block

explorers because, as R-4 said, “you want to know very specif-

ically where it is sent to,’ referring to the analysis of smart

contract calls, he analyzes on a block explorer.

The starting point from which an investigator starts tracing

depends on a case-by-case basis. However, the tracing itself

can be categorized in two ways. One way is relationship-based

(i.e., to look for relationships between wallets). Relationships

are based on (grouped) sets of transactions. The other way,

transaction-based tracing, looks at transfers chronologically,

transaction-by-transaction. Like following, for instance, a ran-

somware payment. No distinction was found based on the

respondent’s role in the choice of tracing.

R-6 said that he traces traction-based by employing the

change-address heuristic, a key indicator of following the

transaction ﬂow, ergo we call this transaction-based tracing.

R-1, on the other hand, describes tracing as looking for related

wallets, hence we call this relationship-based tracing.

The context of the case and the personal preference of

the investigator can be correlated to the tracing method used.

Or as R-1 said: “I look at my job in two ways, once in do

blockchain analysis, I feel more like an analyst. I go one step

further than hard evidence and then go back [...]”. Later on,

the respondent calls this “switching hats” between analyst

USENIX Association 34th USENIX Security Symposium 1361

and investigator, therefore changing the way of tracing.

Transaction-based tracing is commonly used by investi-

gators who appear to focus their roles more on the ﬁnances

of a subject, such as turnover analysis (i.e., estimating

how much a suspect has earned illicitly). On the other

hand, relationship-based tracing appears to be better suited

for identiﬁcation of the wallet’s owner. We must note,

though, that the user interface of the most commonly used

commercial tool (Chainalysis Reactor) by the respondents

is geared toward relationship-based tracing. When R-3 was

asked whether he uses a feature to inspect the transaction, in

other words, doing transaction monitoring: “No, never. [...]

That may be because this is most important in KYC [Know

Your Customer] and that sort of business.’ Two respondents

explicitly mention a hybrid form of tracing. R-5 described

how he applies mostly relationship-based heuristics. Still,

if the tools then give an indirect path between two wallets,

transaction-based tracing is employed to manually strengthen

the analysis of relationship.

Perceptions on attribution accuracy. We asked the inter-

viewees about their perceptions of the tools with regards to

attribution accuracy (i.e., did they encounter cases in which

labels were missing and/or wrong labels were present). We

checked if they ever revisited or cross-checked their analy-

ses from the blockchain intelligence provider. For example,

if they had seized criminal assets, and therefore had ground

truth to check the service’s attribution accuracy. Or after they

received information from third parties they had subpoenaed.

Or if they veriﬁed attribution accuracy through self-initiated

transactions with an illicit service via the tools.

Two respondents acknowledged they do not revisit their

analyses in all cases after a seizure, but they did see the need

for it. All others revisit their initial analysis. We assume that

in the cases where investigators do not revisit their analysis,

they do so as they are not actively involved in the remainder

of the case. Here, we deduce this from those who answered

similarly: these respondents devoted more time to helping

colleagues as they were the only cryptocurrency specialists

in their ofﬁces. R-3 mentions revisiting the analysis: “Then

enter the suspect’s wallet [...] to see where it comes back

in my analysis. [...] That way, I can attribute the wallet to

a speciﬁc wallet [in the analysis]”. R-3’s response shows

investigators adding their ground-truth attribution to their

analysis later on, and all respondents appear to do this most of

the time. All investigators return to their analysis unless they

temporarily assist colleagues with their cases. This workﬂow

potentially harms the quality of the analysis, as a feedback

loop is missing.

By knowing this workﬂow, we can ask them about their

knowledge of conﬂicting attribution (i.e., cases in which they

have information conﬂicting with a blockchain intelligence

provider’s claims). Five out of six respondents perceive an

underestimation – in other words, false negatives. But when

asked about two different sources conﬂicting, most are caused

by what are sometimes called ‘nested services’ – exchanges

that ‘host’ little exchanges that have most or all their liquidity

stored on their bigger ‘parents’ [8]. There, the parent label is

commonly displayed in the tools, but after a subpoena on the

host exchange, from subscriber information, it can be derived

that it is a nested service. In these cases, the conﬂicting

attributions are superﬁcial and do not undermine trust in the

attribution. When asked how they think blockchain intelli-

gence providers work, ﬁve respondents mentioned at least

one way in which those tools attribute addresses. The sixth

respondent called it a “black box”, not wanting to speculate

about how they work. All sources – OSINT, interactions by

blockchain intelligence providers, and customer data – were

mentioned equally as sources from which these companies

generate attribution. Overall, respondents expect false

negatives, although they have encountered false positives,

too. They ﬁnd estimating the sheer size of large-scale

evaluation, like in this paper, difﬁcult to guess, which is

probably because they work primarily on a case-by-case basis.

Interviewee perceptions of ground truth data accuracy.

In addition to understanding tracing, we wanted to understand

the implications of using these tools in day-to-day operations.

Overall, we observed that respondents did expect an underes-

timation in attribution, yet had difﬁculty stating by how much

before we showed the results to them. Furthermore, before

showing the results, they also reported encountering false

positives. When asked, these results could be attributed to

nested exchanges (i.e., exchanges that store their liquidity on

a large other exchange and therefore are difﬁcult to attribute

themselves) of which only the ‘parent’ exchange was labeled.

We shared preliminary results from the overlap analysis in

Section 5with the respondents, to learn about the effects these

results might have on their workﬂow and (if so) how that

would inﬂuence their workﬂow. When confronted with the

results in Figure 1, the respondents said they did not expect

these results but were unsure what to expect to begin with.

Subsequently, they had many questions about what this meant

for their daily work and suggested workﬂow adjustments, such

as using multiple tools to cross-check attribution, as the most

important change. One respondent suggested regular auditing

of the accuracy of attribution. That illustrates the relevance of

our work, but more importantly, it means that we will try to

provide some starting points that, together with the insights of

this study, support a better workﬂow as will be discussed later

in the generalization and discussion sections of this paper.

Although not explicitly covered as a separate section in

the interview protocol, as the interviews functioned to gather

input on what the remainder of the study should look like, we

can still digest valuable insights into how the interviewees

can beneﬁt from the evaluations discussed in this paper. We

see that when it comes to how the interviewees perceive attri-

bution accuracy, they talk about attribution accuracy in terms

1362 34th USENIX Security Symposium USENIX Association

BestMixer

Hansa Market

Wall Street Market

BestMixer Hansa Market Wall Street Market

Ground Truth Chainalysis Ground Truth Chainalysis Ground Truth Chainalysis

True Positives 182,612 (100.00 %) 44,821 (24.54 %) 1,157,519 (100.00 %) 913,171 (78.89 %) 553,777 (100.00 %) 525,236 (94.85 %)

False Positives - 19 (0.01 %) - 1,695 (0.15 %) - 611 (0.11 %)

False Negatives - 137,791 (75.46 %) - 244,348 (21.11 %) - 28,541 (5.15 %)

Figure 1: Overlap in addresses (green) between ground-truth (blue) and Chainalysis (yellow)

of the number of addresses. This is logical as this is a number

displayed in the tracing tools they see daily. However, talking

about impact, they mainly refer to missing attribution and,

therefore, missing information. This is implied by the fact

that most interviewees expect underestimation, and, for in-

stance, R-1 does extra cross-checks with internal data sources.

Further insights might beneﬁt this as they explain why and

when such cross-checks are relevant.

5 Address Overlap

The ﬁrst, and most straight-forward way to measure attri-

bution accuracy is by counting addresses and their overlap

between our ground-truth data and blockchain intelligence

from Chainalysis. This gives an initial view of attribution and

what inﬂuences this.

From Chainalysis, we collected all addresses they attribute

to these services. Then, we calculated the overlap between

the ground truth data versus the labeled addresses from

Chainalysis, as plotted in Figure 1. It can be observed that

for BestMixer, the number of true positives by Chainalysis

44,821

(

24.54%

), false positives is

(

0.01%

), and false

negatives

137,791

(

75.46%

) out of

182,612

addresses. For

Hansa Market, the true positives are at

913,171

(

78.89%

the false positives at

1,695

(

0.15%

), and the false nega-

tives at

244,348

(

21.11%

) out of a total of

1,157,519

ad-

dresses. Lastly, for Wall Street Market, the true positives are at

525,236

(

94.85%

), the false positives at

611

(

0.11%

), and the

false negatives at 28,541 (5.15%) out of 553,777 addresses.

From this ﬁgure, we can infer signiﬁcant differences be-

tween the different types of services. Higher accuracy is

achieved for darknet markets compared to mixers. However,

we must note that we cannot generalize without consider-

ing what causes (in)accurate attribution. Because of the very

high accuracy of Hansa Market and Wall Street Market, we

manually inspected only BestMixer’s false positives.

Apart from some edge cases, which we could not explain,

we found two main reasons why attribution is propagated

incorrectly. The main reason is a directly spent change output.

As one might know, a payment in Bitcoin and, in fact, all

similar (UTXO-based) blockchains often generate a change

output because you have to spend an entire UTXO. There

are heuristics in place that attempt to identify this change

address in a transaction. This change addresses a heuristic,

ﬁrst deﬁned by [26], which states that a change address should

be a freshly created address. This means it has not been seen

on the blockchain before. We identify two cases in which

implementing this incorrectly could cause problems.

There is a protocol called Child Pays for Parent (CPFP),

which became default in 2016 [44]. CPFP allows transactions

to prioritize the incoming (unconﬁrmed) transactions they

want to spend by setting a higher fee. This causes miners to

prioritize their ancestors’ transactions because they want to

optimize mining rewards. CPFP can make it possible that

a fresh change address is already spent, so the change ad-

dress heuristic triggers incorrectly. Therefore, it could result

in the actual deposit being labeled as the change. This then

is propagated, meaning that all input addresses of the wal-

let that deposit bitcoins into the service are considered part

of the service. Next to direct spent change (

direct cases,

and

implied, indirect, cases from those

), we identify two

other causes of false positives. In one case, the wrong change

address was identiﬁed because the address had a different

label attached to it, meaning the deposit address was speciﬁed

as change, which in turn propagated through such that the

input address also belonged to BestMixer. We also noticed

BestMixer was using a relatively high network fee – in some

cases the deposit was similar to the deposit transaction’s fee

(

cases), therefore incorrectly assuming that this address also

belonged to BestMixer. In

cases, the reason was unknown,

and from those

cases, those (indirectly) inﬂuenced incorrect

attribution of 4 more addresses.

USENIX Association 34th USENIX Security Symposium 1363

6 Money Flows

A way to measure attribution, reﬂecting real world transaction-

based tracing, is by leveraging the UTXO model of

blockchains such as Bitcoin.

We apply this model to design an evaluation technique

that resembles how our interviewees trace illicit money

transaction-based and call this technique money ﬂows. We en-

vision a service’s wallet as a directed acyclic graph in which

the nodes are the UTXOs, and the edges are the transactions.

A path of UTXOs from the deposit into a service until it

leaves the service wallet again is a money ﬂow. We use the

addresses corresponding to the UTXOs to map it against the

ground-truth data and data from Chainalysis. This graph will

be acyclic by the deﬁnition of the UTXO model in Bitcoin, as

we use UTXOs rather than addresses. The other way around

– so using addresses rather than UTXOs – is unfeasible, as

we then cannot guarantee the graph is acyclic, and we might

be unable to create a ﬁnite set of ﬂows. For each service, we

collected all transactions on all addresses of that service in

the ground-truth. Suppose a transaction consists of more than

one input and more than one output. In that case, inputs are

linked to the corresponding outputs as in a queue, as proposed

by [2]. This is in line with how, in case law, money laundering

contamination (i.e., taint) is calculated. Additionally, possi-

ble paths grow exponentially within large service wallets,

where transactions have many inputs and outputs. Especially

in darknet markets such as Hansa Market and Wall Street

Market, due to their escrow service system. This makes it

impossible to calculate all possible paths using this method in

a reasonable time frame, even when run on high-performance

computing systems. Every path starts at a deposit, which we

identify if a UTXO has an incoming UTXO that is not part

of the ground-truth and a destination UTXO that is part of

the ground-truth. All deposits generate all paths until a path

comes across an edge where the destination UTXO is not part

of the ground-truth.

We calculated all ﬂows of the three services, BestMixer,

Hansa Market, and Wall Street Market, to better understand

how attribution performs in terms of money ﬂows. We de-

ﬁne three categories for a ﬂow: missed,partial, and full. We

state that if none of the addresses of the UTXOs of the ﬂow

2018-03

2018-05

2018-07

2018-09

2018-11

2019-01

2019-03

2019-05

0.0

0.2

0.4

0.6

0.8

1.0

Propotion of flows covered

missed partial full

Figure 2: Money ﬂows over time for BestMixer

2015-07

2015-10

2016-01

2016-04

2016-07

2016-10

2017-01

2017-04

2017-07

0.0

0.2

0.4

0.6

0.8

1.0

Propotion of flows covered

missed partial full

Figure 3: Money ﬂows over time for Hansa Market

2016-01

2016-07

2017-01

2017-07

2018-01

2018-07

2019-01

2019-07

0.0

0.2

0.4

0.6

0.8

1.0

Propotion of flows covered

missed partial full

Figure 4: Money ﬂows over time for Wall Street Market

are in the data by the commercial data provider, it is missed.

If all addresses are present, we categorize it as full, mean-

ing the commercial data provider fully covers it. In all other

cases, a ﬂow is classiﬁed as partial, representing the com-

mercial data provider partially covers it. We ﬁnd that money

ﬂow coverage is lowest for BestMixer (see Figure 2). But

by plotting those over time, we see apparent differences in

timeframes. At the start, most money ﬂows are missed. This

makes sense, as during its initial phase, almost nobody knew

about these services as they were still being developed, not

publicly advertised, and only tested by a limited number of

individuals. BestMixer started public operations in May 2018,

and we see that Chainalysis can quickly attribute (part of) the

service thereafter. This does point out an important factor in

attribution: interacting with a service is key to getting attribu-

tion if there is no way to broaden attribution by some unique

ﬁngerprint that works across blockchain(s).

On the notion of unique ﬁngerprints: there appears to be

a limiting unique ﬁngerprint present that prevents forward-

tracing of change within the service’s wallet. We found a shift

in the used wallet ﬁngerprint starting mid-February, which

caused an immense drop in money ﬂow coverage. The only

ﬂows they can cover are those involving the donation address

of BestMixer, which was publicly advertised on the service’s

website. We noticed that the wallet structure of the service

changed, and precisely at that moment, the money ﬂow cover-

age declined too. This change involved, among other things,

disabling segregated witnesses and changing the address type.

With Hansa Market and Wall Street Market (see Figures 3

and 4), we again see low money ﬂow coverage initially, sup-

1364 34th USENIX Security Symposium USENIX Association

Service Name Missed Partial Full

BestMixer 71.18 % 11.74 % 17.08 %

Hansa Market 9.34 % 14.33 % 76.34 %

Wall Street Market 6.13 % 4.10 % 89.77 %

Table 3: Money ﬂow completeness percentages

porting our theory that interaction is important but difﬁcult

in the ﬁrst stage because a service is still relatively unknown

or even still in development. Another explanation for this is

open-source intelligence (OSINT). This OSINT could replace

the need to interact with a service and provide ‘historical in-

teractions’, allowing one to build a proﬁle later. We learned

by extensively searching online for posts, news articles, pho-

tos, and videos and on (underground) forums that there is not

much OSINT available that is useful for attribution. That is,

public cryptocurrency addresses or tutorials of people show-

ing how to use these services that leak addresses. We only

found a few addresses online, which is negligible compared to

the total wallet size, and when manually applying heuristics

such as co-spending to them, it hardly gives any attribution

above an insigniﬁcant percentage.

Also, with Wall Street Market, we noticed a gap in attri-

bution being present at the beginning of 2018. Here, we also

believe a change in the wallet system occurred, though indi-

cators are less self-evident than with BestMixer. The format

of a transaction in the transaction table of the marketplace

changes roughly at that exact moment in time. Still, more im-

portantly, we saw a signiﬁcant spike in the amount of UTXOs

in the wallet that moved to other addresses. This considerable

movement could also indicate why the money ﬂow coverage

went down. Money ﬂow coverage increases signiﬁcantly at

the end of the lifetime of Wall Street Market. We believe this

is due to the exit scam attempted by the marketplace [1]. This

led to a large amount of money being moved to only a few

addresses and then withdrawn from the service, increasing

money ﬂow coverage. Furthermore, some exit scam addresses

were publicly advertised online on Reddit. Having established

that the primary form of illicit services attribution is through

direct interaction, we see that getting historical attribution is

limited when addresses are not constantly reused. The lower

the balance of the service’s wallet and the higher the turnover,

the worse the historical attribution if only using current inter-

action with the service.

We showed that money ﬂow coverage changes over time.

Chainalysis is more likely to cover only parts of the ﬂow,

meaning it is more likely to cover only a central wallet. When

tracing UTXO-based, missed ﬂows are potential cases one

can trace through a service without encountering an attribute

at any step in the process. All results of the percentages of

accurate labeling of the services are shown in Table 3. The

results show that attribution varies signiﬁcantly over time.

Generally, we state that for BestMixer, Chainalysis covers

(partially or completely) a ﬂow in

28.82

percent of cases. For

darknet markets, money ﬂow coverage is much higher. For

Hansa Market, this is

90.67

percent, and for the Wall Street

Market, this is 91.87 percent.

7 Address Roles

Apart from UTXO-based tracing, we identify a second way

to trace illicit money ﬂows on UTXO-based blockchains:

relationship-based. Here, we state that the role an address has

in the internal workings of the wallet of the service is essential.

We employ this reasoning for a third way of measuring attri-

bution: through the role of the address. We deﬁne three types

of transactions: a deposit transaction, an internal transaction,

and a withdrawal transaction. We state that a transaction is

a deposit transaction if none of the input addresses belong

to the ground-truth of the service and if one or more output

addresses do belong to the service. A transaction is an inter-

nal transaction if all the input and output addresses are in the

ground-truth. A transaction is a withdrawal transaction if one

or more of the input addresses belong to the ground-truth and

one of the output addresses does not, leaving room for one

or more output addresses to change addresses and belong to

the ground-truth. Based on the roles we give transactions, we

deﬁne the addresses’ roles. These have a combination of roles

based on incoming transactions and outgoing transactions.

An address, for instance, receives money (i.e., all incoming

transactions are of the type ‘deposit’) and then sends it to a

Incoming Transactions Outgoing Transactions Addresses Clusters Cluster Coverage Chainalysis

deposit withdrawal 62,606 56,110 11.22 %

withdrawal withdrawal 57,462 45,146 24.10 %

internal withdrawal 3,465 2,658 14.94 %

deposit internal 11,608 636 2.36 %

deposit, withdrawal withdrawal 3 439 92.03 %

withdrawal internal 46,300 276 6.88 %

Other Other 1,168 172 1.74 %

Total 17.55 %

Table 2: Address roles of BestMixer

USENIX Association 34th USENIX Security Symposium 1365

central wallet (i.e., all outgoing transactions are of the class

‘internal’). Using this, we deﬁne address roles. Then, we apply

co-spending heuristics to the addresses to build clusters of

these addresses. We do so because tracing tools also use co-

spending. With these datasets, where the private keys were not

exposed, and no CoinJoins could appear with other addresses,

we can safely state that this heuristic holds. We calculate

the address role coverage of the tracing tools in terms of the

percentage of clusters of a speciﬁc role they have labeled.

As might be expected, address role coverage is not random.

In the case of BestMixer, we notice that Chainalysis is better

at covering deposits immediately withdrawn again (see Table

2). We see that an incoming transaction can be of the type

‘withdrawal’, which seems counterintuitive, but it is, in fact,

a change from an earlier withdrawal of another user. When

we compare the results of BestMixer with those of darknet

markets, we see a difference in how they work and, therefore,

how they handle funds. Here, one might expect a more precise

differentiation between user funds. And that is also what we

see. For instance, the address roles of Wall Street Market

(see Table 4). The majority of the addresses are deposits and

withdrawals. These correspond to the order structure of the

marketplace, where users generate a payment address, and

the vendor withdraws the proﬁt. This withdrawal is batched

and grouped with other transactions to save money for the

marketplace on the blockchain.

Hansa Market employs a different structure than Wall Street

Market, producing many more internal transactions. This

is because users at Hansa had a wallet at the marketplace,

whereas for Wall Street Market, they paid for an order at an

order-speciﬁc address. This greatly affects the wallet’s inter-

nal workings. For Hansa, many more internal transactions are

necessary to move funds to the right (escrow) wallets. One

might expect, also based on the overlap analysis, that these

internal transfers make it much easier to label Hansa. This is

indeed the case regarding address roles covered, but looking

back at the address overlap in Figure 1, we see the contrary.

This means that more internal (and maybe also relevant) ad-

dresses are labeled in Hansa than in Wall Street Market.

The difference in coverage between address roles gives

clear indicators of BestMixer’s decentralized nature. On a

higher level, BestMixer consists of large chains of deposits

and withdrawals like a peel chain (a term also used by [25]).

This can be deduced from the many roles ‘deposit

with-

drawal’ and ‘withdrawal

withdrawal’. Although Wall Street

Market also shows that pattern, what probably makes up for

that is that address reuse is more prevalent there, linking more

addresses. Hansa Market and Wall Street Market work dif-

ferently compared to BestMixer. This makes for different on-

chain behavior. Most importantly, they facilitate transactions

between buyers and sellers on the market. These transactions

occur on-chain to secure them in an escrow system using

multi-signature wallets. Hansa Market worked slightly dif-

ferently than Wall Street Market, as Hansa had a wallet per

account, whereas Wall Street Market generated an address for

a buyer to pay speciﬁcally for an order.

We can see the results of the difference in on-chain behav-

ior for the most occurring roles in Table 5. Hansa Market

Incoming Transactions Outgoing Transactions Addresses Clusters Cluster Coverage Chainalysis

deposit withdrawal 494,535 78,972 75.69 %

deposit internal 52,943 4,335 24.57 %

deposit, withdrawal withdrawal 632 1,051 95.34 %

internal withdrawal 901 677 59.53 %

withdrawal withdrawal 1,273 624 94.07 %

deposit, internal withdrawal 61 174 85.06 %

Other Other 2,285 92 6.52 %

Total 73.35 %

Table 4: Address roles of Wall Street

Incoming Transactions Outgoing Transactions Addresses Clusters Cluster Coverage Chainalysis

internal internal 383,353 146,210 95.51 %

deposit internal 371,249 124,672 96.03 %

deposit, internal internal 32 83,663 97.34 %

internal withdrawal 71,249 24,459 1.77 %

deposit withdrawal 69,831 24,392 4.49 %

withdrawal internal 66,401 19,086 93.91 %

Other Other 122,009 39,264 0.20 %

Total 86.84 %

Table 5: Address roles of Hansa Market

1366 34th USENIX Security Symposium USENIX Association

has relatively many internal-related roles. The account-based

wallet management likely causes that. First, one has to deposit

money into the market, adding to an account balance. That

balance can then be used to order something on the market.

When an order occurs, an escrow wallet must be created be-

tween the buyer, seller, and market – which means another

transaction on the blockchain. This whole behavior causes

many internal transactions compared to Wall Street Market.

The different roles in a wallet clearly show what to focus on to

get good address role coverage. One needs a ‘vantage point’

from which to start expanding attribution. For BestMixer,

focusing on identifying peel chains would lead to greater ad-

dress role coverage. Another ‘vantage point’ was the internal

liquidity provider system in Bestmixer, which they used until

February 2019. That way, a strong ‘vantage point’ was lost.

Regarding Hansa Market and Wall Street Market: focus-

ing on market transactions causes high address role coverage.

Also, for Hansa Market: it is important to focus on identifying

the addresses belonging to account wallets. Those are the ﬁrst

to discover when doing transactions with Hansa Market. To

identify market transactions, one can order something on the

market. A common denominator is to ﬁnd a shared ﬁngerprint

between transactions. That way, one can apply knowledge of

transactions to ﬁnd more transactions. This can for instance be

done based on roles. Identifying how the market fee is calcu-

lated in an order allows an analyst to identify more rules one

can apply to already identiﬁed wallets to ﬁnd new addresses.

Overall, we see a difference in address role coverage be-

tween BestMixer on the one hand and Hansa Market and Wall

Street Market on the other. We suspected that addresses are

reused more often in marketplaces. Therefore, we calculated

the average amount of transactions per address, which for

BestMixer turned out to be 2.00, for Hansa Market 2.32, and

for Wall Street Market turned out to be 2.27. The key obser-

vations here is, however, the variance – which for BestMixer

is 0.42, for Hansa Market is 1 and for Wall Street Market is

4. Certain addresses occur much more frequently, and from

closely inspecting those addresses, we found that they belong

to pivotal roles in the darknet markets, such as the addresses

that receive the fee for the market.

Other than tracing transaction-based, picking related wal-

lets is more of a gamble for users of these tools. What is

relevant is more dependent on the case context, which means

this metric has less control over which of the related entities is

picked, compared to transaction-based tracing, where you of-

ten have the guarantee that core components such as internal

mixing pools or the escrow service are covered as the trans-

action trace is followed step by step. The real-world impact,

therefore, is more diffuse. On the other hand, role-based attri-

bution is closer related to the internal business operations of a

service and, therefore, leads to better insight into the internal

workings, which, for those wanting to improve attribution.

8 Generalization

To understand how our evaluation of attribution can be

generalized based on our ﬁndings, we take two perspectives:

by adding another blockchain intelligence provider, and

secondly, we reason about the generalization of clustering

heuristics and under which circumstances these can be

generalized.

Blockchain intelligence provider generalization. With the

use of a second data provider, we will attempt to general-

ize our results and pinpoint what efforts have to be put into

attribution manually or through proprietary techniques and

algorithms by blockchain intelligence providers. For this, we

use data from Arkham Intelligence. Arkham Intelligence is

a blockchain intelligence provider that aims to deanonymize

cryptocurrency ﬂows by linking blockchain addresses to real-

world entities – just like the other providers such as Chainaly-

sis. A distinctive feature of the platform is its Intel-to-Earn

program, which incentivizes users to contribute attribution

data through a marketplace model. In this study, we use

Arkham’s publicly available labels. Arkham has data about

only one ground truth illicit service used in this study: Wall

Street Market. We will use that to reason about generalization.

In Arkham, Wall Street Market’s wallet consists of only two

addresses. Both addresses are present in the ground truth data,

giving a false positive rate of exactly 0 . However, given there

are only two, the false negative rate is signiﬁcant - as these

two represent only

0.0003%

of all addresses in the ground

truth data. Yet, these addresses are the core addresses of Wall

Street Market: they handle the fee the market receives in every

order ever placed on the market. These two addresses alone

are responsible for a signiﬁcant part of all transactions during

the lifetime of the market (

10.44%

). This again shows the

importance for metrics such as money ﬂows and address roles

rather than simply counting addresses. Given the low number

of addresses present in Arkham and the nature of Arkham as a

crowdsourced and OSINT-based provider, it is to be expected

that one or both addresses were submitted to Arkham and

no additional ﬁngerprinting or manual analysis was added

by Arkham. Manual inspection shows these addresses were

merged based on just the co-spending heuristic. This leads

us to exclude the option that additional research to improve

attribution as is known to be done by blockchain intelligence

providers went into these addresses.

When evaluating attribution of Arkham in terms of money

ﬂow coverage or address role coverage, we consider these two

addresses – for this paper – as starting points for attribution.

When applying the knowledge that these are fee addresses,

we can expand attribution to a total of

48,368

addresses

without any false positives. We do so by adding two domain

knowledge rules we can only apply to fee addresses in Wall

Street Market: (1) when a transaction deposits money into

these addresses, it comes from escrow or the buyer, and (2)

USENIX Association 34th USENIX Security Symposium 1367

2016-01

2016-07

2017-01

2017-07

2018-01

2018-07

2019-01

2019-07

0.0

0.2

0.4

0.6

0.8

1.0

Propotion of flows covered

missed partial full

Figure 5: Money ﬂows of Wall Street Market by Arkham -

after clustering

the other output is the payout to the vendor. Then, when

calculating money ﬂows (see Figure 5, the proportion of Wall

Street Market which is (partially) attributed is small (

6.44%

partially and

0.42%

fully covered). This highlights the efforts

blockchain intelligence providers take manually or through

in-house algorithms to improve attribution accuracy, and

that this has to be done on a service-speciﬁc level shows the

difﬁculty of generalizing. Arkham does not add additional

clustering, which we performed ourselves. Looking at the

address roles in the enhanced cluster of Wall Street Market

(see Table 6), we observe that attribution primarily covers

addresses and clusters used for internal transactions from

the service. Apparently, the market’s fee addresses together

with the clustering are good at attributing the ‘backend’ of

the service, but not good at attributing the deposit side of the

market.

Heuristics generalization. Apart from generalization across

blockchain intelligence providers, we can also look at gener-

alization of clustering heuristics. Using our empirical analysis

we can clarify under which circumstances these heuristics

can be generalized without the chance of false positives. It

is to be expected that two heuristics play a major role across

all services: co-spending and change-address. Both can be

generalized, assuming certain conditions are met.

Transactions Addresses Clusters Cluster

Coverage

Incoming Outgoing

deposit withdrawal 494,535 78,972 0.01 %

deposit internal 52,943 4,335 0.00 %

deposit, withdrawal withdrawal 632 1,051 87.06 %

internal withdrawal 901 677 81.09 %

withdrawal withdrawal 1,273 624 99.36 %

deposit, internal withdrawal 61 174 2.87 %

Other Other 2,285 92 4.35 %

Total 2.48 %

Table 6: Address roles of Wall Street Market by Arkham -

after clustering

For co-spending, the major threat is that a CoinJoin occurs

in which the service participates. When a service does not use

this feature and no external factors – like users of the service

– can exploit a feature that allows for a CoinJoin to occur,

the heuristic holds. For example, MtGox [23,28] allowed for

a private key to be added to an account. If that private key

was associated with a CoinJoin transaction, it might lead to

false positives. In all other cases, where the service’s wallet

cannot be extended with external private keys, it does hold,

and in those cases, it must be considered conservative – i.e.,

it might create multiple clusters of the same wallet. Therefore

the heuristic creates a tendency towards false negatives rather

than false positives.

For the change-address heuristic, we have also seen signs

that it has been employed in clustering ground truth services

by Chainalysis, especially Bestmixer, which wallet contains

of large peeling-chains, which can only be attributed on this

scale with some form of change address clustering. However,

this heuristic is more complex to evaluate and we have, in

fact, seen some signs of false positives on Chainalysis related

to change address detection. For instance, in the case when

CPFP was active. PayJoin is a technique speciﬁcally built to

break this heuristic. Again, just like with co-spending, one

can assess if a service supports PayJoin, for instance by as-

sessing of a PayJoin-endpoint is present for deposits. If so,

both co-spending and change-address heuristics should be

treated carefully, as it can no longer be guaranteed that these

heuristics hold. They might lead to false positives.

Both heuristics can be applied generally at any service after

thorough assessment of those services: do they interact with

a CoinJoin service – e.g., for internal liquidity management –

or do they allow external wallets to be imported that might

have interacted with a CoinJoin service? And likewise

for PayJoin: does it support that? If so, then similarly to

CoinJoin, one should treat the heuristics carefully as they

might generate false positives. But in the three ground truth

datasets, none of those conditions were met, meaning those

heuristics were accurate but conservative, resulting in false

negatives over false positives by design.

Mapping to law enforcement practice. The insights gained

suggest that generalization is highly case-speciﬁc. Most inter-

viewees admitted to having a limited understanding of how

blockchain intelligence providers operate. This highlights the

importance of critically examining the source of attribution

by the law enforcement professional to determine if and how

it can be expanded. Additionally, it is essential for blockchain

intelligence providers to offer more transparency about their

methods. In terms of generalization across providers, in the

real world, this generalization question has limited impact,

since LEAs rely almost exclusively on one tool. The in-

terviews conﬁrmed that all law enforcement agencies use

Chainalysis. Only one agency was using a second commercial

provider. In fact, how we do our study with a second public

1368 34th USENIX Security Symposium USENIX Association

provider resembles the results from our interview study: law

enforcement using a public tool for veriﬁcation purposes.

9 Related Work

Our paper builds on and beneﬁts from recent advancements in

three topics. First, we build and expand on blockchain-based

heuristics – like co-spending and change address heuristics.

Second, we add to studies that evaluate measurement metrics.

Third and last, we advance work on leveraging ground-truth

data analysis, in our case to evaluate attribution.

Blockchain based heuristics. We build upon the work ﬁrst

introduced by [26], later enhanced by [25] and others. They

deﬁne state-of-the-art clustering and attribution practices

used till this day. Clustering is an important factor for

scaling attribution. As said before, it propagates a label from

one address to many, grouping addresses on UTXO-based

blockchains. Many more studies apply different variations to

these heuristics, but the core concepts are always the same

and based on [26]. According to our research, attribution

practices described by [26], such as interacting with services,

are still one of the main ways to identify with a high degree of

certainty that an address belongs to a particular service. Apart

from interacting with services, [43] applied ﬁngerprint-based

heuristics on the entire blockchain to attribute transactions

belonging to speciﬁc services (i.e., coinjoin mixers). We

cannot deﬁnitively prove that blockchain-wide ﬁngerprints

were used for attribution, but since funds in the development

phase are rarely labeled, this suggests these type of heuristics

were not apparent in our cases, as otherwise, they would also

have been labeled. However, their work shows a different

type of attribution effort that we cannot ignore when testing

attribution.

Evaluating measurement metrics. The work from [25] is

one of the few attempts to work with ground-truth data to

validate heuristics broadly used in cryptocurrency tracing.

It, therefore, shows the pros and cons of using the change

address heuristic. However, the work has limitations as it

assumes the data from only one commercial data provider

(Chainalysis in this study) to be the ground-truth. That data is

limited in size, and this study shows that Chainalysis data

is not perfect compared to ground-truth from illicit services.

Next, the works of [22] and [10] use the graph-based nature

of UTXO blockchains for measurements heuristics. They

use it for expanding their search space or candidate set

– i.e., clustering – and therefore look ‘outwards’. We on

the other hand, look ‘inwards’ into a wallet and use it for

labeling rather than clustering. In a slightly different ﬁeld, [5]

compared threat intelligence data from various providers.

Even though they solely focus on commercial intelligence

data, the overlap analysis is similar. Also, they interviewed

participants on their perception of how accurate the data they

use daily is. The results by [5], and especially those on the

perception of the data, vary with our work. Where they show

a higher lack of trust in data with participants, in our study,

it is less clear that our interviewees do. However, in their

case, the ﬁeld is much more diverse, with many more parties

providing attribution than in blockchain intelligence.

Ground-truth data analysis. We build upon earlier work

that analyses ground-truth data seized by law enforcement

such as [13] and [27]. Both use seized data from law en-

forcement to analyze user and market behavior. We, however,

use such data access for evaluating the attribution of external

systems used by such services – blockchains, to be speciﬁc.

One could argue that it is a different type of market (e.g.,

cryptocurrency attribution data vs user security practices).

Additionally, [25] uses a relatively small set of cryptocur-

rency addresses as ground truth for veriﬁcation. However,

they also reason about false positives in the change address

heuristic. Yet, the number of addresses is smaller, our overall

wallet coverage is larger, and we focus on illicit services.

10 Discussion

In this section, we reﬂect on the implications of our ﬁndings,

discuss limitations, and explore potential future work.

Implications. We found that attribution is performed

conservatively – meaning Chainalysis attributes a reliable

lower bound. This makes perfect sense when you look

at who uses these tools. Many investigators, compliance

departments, and law enforcement agencies rely on the

attribution data provided by commercial data providers in

their day-to-day activities to prevent or detect crime. In turn,

this means they are primarily interested in attributing illicit

services. We empirically evaluate attribution of illicit services

using seized ground-truth data. This is highly relevant, as

in many court cases, evidence is generated by tools these

companies provide. In many cases, the internal process

of how blockchain intelligence is generated – i.e., how

attribution is achieved – remains a ‘black box’ [30,33,41].

Our work provides at least partial insights into the inner

workings of that ‘black box’. It is good to know that these

tools provide estimates for tracing cryptocurrencies. In

most cases, tools are conservative and prefer not to show a

label if uncertainty is a factor. Our study shows that being

aware of this is vital. One of the interviewees stated that

he guessed (or expected) that precision was 100% before

being confronted with the preliminary results, which we

interpret as no false positives and no false negatives. Being

aware that attribution might not be perfect might help the

workﬂow, though visual indicators in tracing tools might also

help. Maybe together with newly developed heuristics that

indicate a change of ownership between clusters of addresses

(i.e., because different ﬁngerprints are being used). Here,

USENIX Association 34th USENIX Security Symposium 1369

attribution focuses on forensic accuracy: stating only what

you know for certain. In the use case of law enforcement, this

is of great importance. However, the case of BestMixer shows

that at least when tracing transaction-based, they lack a large

number of ﬂows nonetheless. In the case of calculating the

taint of an address indirectly, like in the algorithm developed

by [2], one has to take this into account.

Limitations. First, we acknowledge that having access to

data from only one blockchain intelligence company is a

limitation of our work. We pursued the inclusion of more

data sources and reached out to several commercial providers.

Through our collaboration agreement with law enforcement,

we had access to another provider. We also analyzed their

attribution. Prior to submission, we did responsible disclosure

to Chainalysis and to the second ﬁrm. Where the former

responded very positively, thanking us for the insights, the

second provider immediately responded with legal objections

and threats. In subsequent conversations, we made extensive

efforts to resolve the concerns of the second provider. This

included proposing to include its data anonymously in the

manuscript but this offer was declined. The provider said that

any disclosure, even anonymous, would be treated as grounds

for legal action. This left us with no option but to include only

Chainalysis and to explore generalization via the comparisons

with the public explorer of Arkham Intelligence.

In this work, we took snapshots of the current attribution of

Chainalysis of the illicit services where we have ground-truth

available. This means we analyzed the current state of

blockchain intelligence. We know this intelligence changes

over time – both in positive and negative ways. This also

means that by looking at the current state, we do not know

when an address was ﬁrst labeled and if there was a delay

in doing so. This is partially inevitable because clustering

heuristics cannot identify new deposits until they are spent.

However, we do not know the latency between when an

address was ﬁrst identiﬁed by interactions versus when it was

ingested into the software – assuming that interactions are

the primary way new addresses are identiﬁed. Additionally,

attribution – in terms of address roles and money ﬂow

coverage – differs signiﬁcantly between various types of

services. Therefore, we have to mention that these are

just three cases – as this is a case study – and cannot be

generalized easily since attribution relies on many factors

– such as how the developers of such services decide on

the internal workings of their wallets, attribution intensity,

and maybe also how users send money to these services.

We show, by our generalization efforts using Arkham, that

service-speciﬁc clustering is the main factor inﬂuencing

attribution. Next, we ﬁnd that methods such as PayJoin and

CoinJoin are of no risk of leading to false positives under the

condition that the attributed service does not use them. If they

do, then in theory it might result in false positives, but we

do not have data to support this. However, especially with

address roles, we show that certain patterns within a wallet

represent it better as a whole. Therefore, if you focus on that

pattern, you also cover most. If you interact with services and

look at which addresses you deposited or where those coins

you received back came from, it is possible to identify key

addresses of a service’s wallet. Labeling those addresses is

critical, but we might imagine this is difﬁcult to generalize

and automate.

Future work. This research evaluates the current state of the

art in attribution but leaves various questions unanswered that

might be worth exploring. As with most studies on clustering

and labeling, we assume that we know about the existence

of a service. Only then can one interact with it (assuming

interactions are vital to gathering attribution in that case).

Our money ﬂow evaluation technique highlights that it is

difﬁcult to gather historic attribution. Meaning that learning

about the existence of a service early is critical. Additionally,

when interaction with services is necessary, we leave out how

this is done most effectively. In other words, what strategies

must one apply to get the best attribution? These strategies

might also depend on the type of service, for instance, address

reuse, internal asset management (i.e., does it send money

to some form of hot wallet), and the daily balance in the

wallet of that service. Identifying and testing good strategies

might also be the subject of future research. Lastly, we only

looked at the label put on an address, not how this propagation

through clustering heuristics works. This was done to scope

this research and focus on the thing most important to those

using the data. However, this leaves the area open to ﬁnding

out which clustering heuristics work well, which should be

avoided, and how to ﬁnd new ones. We believe that there is

still a lot of research that can be done on this topic, given

what we saw in the data we analyzed. This is also a subject

for future research.

11 Conclusion

Leveraging insights from interviewing law enforcement pro-

fessionals who trace cryptocurrencies on a daily basis, we

evaluated illicit service attribution in this case study on three

viable cases by commercial blockchain intelligence provider

Chainalysis. We measured attribution in three ways: address

overlap, money ﬂows, and address roles. The latter two evalua-

tions align with the types of tracing that we identiﬁed by inter-

viewing experienced law enforcement ofﬁcers. We contrasted

ground-truth data against blockchain intelligence on three

seized illicit services: BestMixer, Hansa Market, and Wall

Street Market. Chainalysis underestimates the total number of

addresses of these illicit services by providing a reliable lower

bound with very few false positives. Coverage depends on

the kind of service and their wallet activity. The two darknet

markets receive better address role and money ﬂow coverage

than the mixing service.

1370 34th USENIX Security Symposium USENIX Association

Overall, we evaluated the attribution of illicit services by

Chainalysis against ground truth data and learned about the

factors inﬂuencing attribution. We evaluated attribution in

three ways: by counting the number of addresses, measuring

the ﬂow transaction-based, and calculating the accuracy of

the different roles of clusters within a service. Overall, we

can state that Chainalysis provides a reliable, lower bound –

up to 95% – with false positives being extremely rare. Yet,

coverage of BestMixer is lower than that of Wall Street Market

and Hansa Market. Last, we identiﬁed four important factors

that inﬂuence attribution: the knowledge about a service’s

existence such that one interacts with it, the centrality of the

service, address reuse, and the format of the transactions of

the service. Those inﬂuence how well the heuristics work,

and together with knowledge about the internal workings of

service, this leads to better attribution. We demonstrate that

generalization of attribution ﬁndings is possible but highly

context-dependent, requiring an understanding of both data

availability and heuristic limitations. By comparing Arkham

Intelligence to ground truth data and examining heuristic

behavior, we show that while some attribution can generalize

through domain knowledge and conservative heuristics, such

generalization must be critically assessed case by case.

Ethics considerations

In line with applicable laws and regulations, the relevant au-

thorities have seized the infrastructure of BestMixer, Hansa

Market, and Wall Street Market. Using this legally seized data

for empirical research raises certain ethical questions, which

we discuss below. While the seizures were lawful, one should

not assume that all analyzed transactions concern illegal be-

havior.

Before seized data was made accessible to us for academic

research purposes, public prosecutors weighed, among other

things, the impact of the work on the rights and privacy of

all parties. A Dutch law enforcement privacy ofﬁcer vetted

our data subset to ensure that it was minimized to only data

vital to our research and contained no personally identiﬁable

information. We only had access to data we required for our

analyses: addresses, transaction hashes, timestamps, and con-

textual information of a deposit, such as the output index

and amount, to deduce the deposit address from a transaction

hash. Similar to earlier work on seized datasets [13,27,32,45],

all of our analyses were conducted on-site at Dutch law en-

forcement agencies, where the data was stored and protected

under their safety and security guidelines. We conferred with

our IRB beforehand. They viewed this work as outside their

jurisdiction, yet were satisﬁed with the assessments and pro-

cedures, outlined above, as set by the public prosecutors and

law enforcement privacy ofﬁcers. Also, after the study was

completed, per our agreement with law enforcement, they

were informed to check for any further ethical considerations

according to predetermined guidelines; no further ethical con-

siderations have been raised since. The data minimization

procedure ensures that no harm was done to individuals in-

cluded in the dataset: no personally identiﬁable information,

such as usernames, was used in this study. Please note that

bitcoin address and wallet information was already publicly

available, and no attribution data – i.e., bitcoin wallet, cluster,

or address information of any of the seized services – is pub-

lished in this paper. Last, as part of our responsible disclosure,

we informed Chainalysis early on about our ﬁndings. That is,

we provided them with addresses labeled as an illicit service,

which were not (false positives), and the reason we believe

this occurred.

Open science policy

We strongly support the use open data. Yet, access to data used

in this paper is restricted in terms of a) licensing with regard

to commercial data, and b) seized data is protected under

criminal law, where only authorized access for designated

researchers exists. Therefore, it is not legally possible to make

the data in this paper public. However, given the public nature

of the blockchain, a wealth of raw transaction data is available

and has fueled academic work into illicit services in the past

years.

References

[1]

Yara Abdel Samad. Case study: Dark web markets. Dark Web Investi-

gation, pages 237–247, 2021.

[2]

Ross Anderson. Making bitcoin legal (transcript of discussion). In

Security Protocols XXVI: 26th International Workshop, Cambridge,

UK, March 19–21, 2018, Revised Selected Papers 26, pages 254–265.

Springer, 2018.

[3]

Andreas M Antonopoulos. Mastering Bitcoin: unlocking digital cryp-

tocurrencies. " O’Reilly Media, Inc.", 2014.

[4]

Rainer Böhme, Nicolas Christin, Benjamin Edelman, and Tyler Moore.

Bitcoin: Economics, technology, and governance. Journal of economic

Perspectives, 29(2):213–238, 2015.

[5]

Xander Bouwman, Harm Grifﬁoen, Jelle Egbers, Christian Doerr, Bram

Klievink, and Michel Van Eeten. A different cup of

{

}

? the added

value of commercial threat intelligence. In 29th USENIX security

symposium (USENIX security 20), pages 433–450, 2020.

[6]

Roderic Broadhurst, David Lord, Donald Maxim, Hannah Woodford-

Smith, Corey Johnston, Ho Woon Chung, Samara Carroll, Harshit

Trivedi, and Bianca Sabol. Malware trends on ‘darknet’crypto-markets:

Research review. Available at SSRN 3226758, 2018.

[7]

Lars Brünjes and Murdoch J Gabbay. Utxo-vs account-based smart

contract blockchain programming paradigms. In Leveraging Applica-

tions of Formal Methods, Veriﬁcation and Validation: Applications: 9th

International Symposium on Leveraging Applications of Formal Meth-

ods, ISoLA 2020, Rhodes, Greece, October 20–30, 2020, Proceedings,

Part III 9, pages 73–88. Springer, 2020.

[8]

Chainalysis. 270 service deposit addresses drive 55cryptocurrency.

https://www.chainalysis.com/blog/cryptocurrency-money

-laundering- 2021/, Feb 2021.

[9]

Chainalysis Team. Is bitcoin traceable?

https://blog.chainalys

is.com/reports/is-bitcoin- traceable/, 2022.

USENIX Association 34th USENIX Security Symposium 1371

[10]

Ling Cheng, Feida Zhu, Yong Wang, Ruicheng Liang, and Huiwen

Liu. Evolve path tracer: Early detection of malicious addresses in

cryptocurrency. In Proceedings of the 29th ACM SIGKDD Conference

on Knowledge Discovery and Data Mining, pages 3889–3900, 2023.

[11]

Nicolas Christin. Traveling the silk road: A measurement analysis of

a large anonymous online marketplace. In Proceedings of the 22nd

international conference on World Wide Web, pages 213–224, 2013.

[12]

Nicolas Christin. Measuring and analyzing online anonymous (’dark-

net’) marketplaces. CARNEGIE-MELLON UNIV PITTSBURGH PA,

2022.

[13]

Alejandro Cuevas, Fieke Miedema, Kyle Soska, Nicolas Christin, and

Rolf van Wegberg. Measurement by proxy: On the accuracy of online

marketplace measurements. In 31st USENIX Security Symposium

(USENIX Security 22), pages 2153–2170, 2022.

[14]

Department of Justice. Ohio resident pleads guilty to operat-

ing darknet-based bitcoin mixer that laundered over 300 million.

https://www.justice.gov/opa/pr/ohio-resident-pleads-guilty-operating-

darknet-based-bitcoin-mixer-laundered-over-300-million, Aug

2021.

[15]

Dmitry Ermilov, Maxim Panov, and Yury Yanovich. Automatic bitcoin

address clustering. In 2017 16th IEEE International Conference on

Machine Learning and Applications (ICMLA), pages 461–466. IEEE,

2017.

[16]

Europol. International sting against dark web vendors leads to 179

arrests.

https://www.europol.europa.eu/media-press/newsr

oom/news/international-sting-against-dark-web-vendors

-leads- to-179-arrests, Sep 2020.

[17]

Europol. 150 arrested in dark web drug bust as police seize C26 million.

https://www.europol.europa.eu/media-press/newsroom/ne

ws/150-arrested-in-dark-web-drug-bust-police-seize-%

E2%82%AC26-million, Oct 2021.

[18]

Europol. One of the darkweb’s largest cryptocurrency laundromats

washed out.

https://www.europol.europa.eu/media-press/n

ewsroom/news/one-of-darkwebs-largest-cryptocurrency-l

aundromats-washed- out, Mar 2023.

[19]

FinCEN. First bitcoin mixer penalized by ﬁncen for violating

anti-money laundering laws. https://www.ﬁncen.gov/news/news-

releases/ﬁrst-bitcoin-mixer-penalized-ﬁncen-violating-anti-money-

laundering-laws, Oct 2020.

[20]

FIOD. The ﬁod and the public prosecution service take money launder-

ing machine for cryptocurrencies ofﬂine. https://www.ﬁod.nl/the-ﬁod-

and-the-public-prosecution-service-take-money-laundering-machine-

for-cryptocurrencies-ofﬂine/, May 2019.

[21]

FIOD. Arrest of suspected developer of tornado cash.

https://www.

fiod.nl/arrest-of-suspected-developer-of-tornado-cas

h/, Aug 2022.

[22]

Gibran Gomez, Pedro Moreno-Sanchez, and Juan Caballero. Watch

your back: identifying cybercrime ﬁnancial relationships in bitcoin

through back-and-forth exploration. In Proceedings of the 2022 ACM

SIGSAC conference on computer and communications security, pages

1291–1305, 2022.

[23]

Martin Harrigan and Christoph Fretter. The unreasonable effec-

tiveness of address clustering. In 2016 intl ieee conferences

on ubiquitous intelligence & computing, advanced and trusted

computing, scalable computing and communications, cloud and

big data computing, internet of people, and smart world congress

(uic/atc/scalcom/cbdcom/iop/smartworld), pages 368–373. IEEE, 2016.

[24]

Harry Kalodner, Malte Möser, Kevin Lee, Steven Goldfeder, Martin

Plattner, Alishah Chator, and Arvind Narayanan.

{

BlockSci

}

: Design

and applications of a blockchain analysis platform. In 29th USENIX

Security Symposium (USENIX Security 20), pages 2721–2738, 2020.

[25]

George Kappos, Haaroon Yousaf, Rainer Stütz, Soﬁa Rollet, Bernhard

Haslhofer, and Sarah Meiklejohn. How to peel a million: Validating

and expanding bitcoin clusters. In 31st USENIX Security Symposium

(USENIX Security 22), pages 2207–2223, 2022.

[26]

Sarah Meiklejohn, Marjori Pomarole, Grant Jordan, Kirill Levchenko,

Damon McCoy, Geoffrey M Voelker, and Stefan Savage. A ﬁstful of

bitcoins: characterizing payments among men with no names. In Pro-

ceedings of the 2013 conference on Internet measurement conference,

pages 127–140, 2013.

[27]

Fieke Miedema, Kelvin Lubbertsen, Verena Schrama, and Rolf van

Wegberg. Mixed signals: Analyzing

{

Ground-Truth

}

data on the users

and economics of a bitcoin mixing service. In 32nd USENIX Security

Symposium (USENIX Security 23), pages 751–768, 2023.

[28]

Malte Möser and Arvind Narayanan. Resurrecting address clustering

in bitcoin. In International Conference on Financial Cryptography and

Data Security, pages 386–403. Springer, 2022.

[29]

Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system.

Decentralized business review, page 21260, 2008.

[30]

Lily Hay Newman and Andy Greenberg. Bitcoin fog case could put

cryptocurrency tracing on trial.

https://www.wired.com/story/

bitcoin-fog-roman-sterlingov-blockchain-analysis/

, Aug

2022.

[31]

Jonas David Nick. Data-driven de-anonymization in bitcoin. Master’s

thesis, ETH-Zürich, 2015.

[32]

A. Noroozian, J. Koenders, E. van Veldhuizen, C.H. Ganan, S. Alrwais,

D. McCoy, and M. van Eeten. Platforms in everything: analyzing

ground-truth data on the anatomy and economics of bullet-proof host-

ing. In USENIX Security 19), pages 1341–1356, 2019.

[33]

Jan-Jaap Oerlemans, KMT Helwegen, et al. Annotatie hof den haag 1

februari 2022, ecli: Nl: Ghdha: 2022: 104. Computerrecht, 2022.

[34]

Department of Justice. Alphabay, the largest online “dark market,” shut

down.

https://www.justice.gov/opa/pr/alphabay-largest- o

nline-dark- market-shut-down, Jul 2017.

[35]

Department of Justice. Binance and ceo plead guilty to federal charges

in $4b resolution. Nov 2023.

[36]

Ofﬁce of Foreign Asset Control. U.s. treasury issues ﬁrst-ever sanctions

on a virtual currency mixer, targets dprk cyber threats.

https://home

.treasury.gov/news/press-releases/jy0768, May 2022.

[37]

Ofﬁce of Foreign Asset Control. U.s. treasury sanctions notorious

virtual currency mixer tornado cash.

https://home.treasury.gov/

news/press-releases/jy0916, Aug 2022.

[38]

Marek Palatinus and Pavol Rusnak. Multi-account hierarchy for deter-

ministic wallets.

https://github.com/bitcoin/bips/blob/mas

ter/bip-0044.mediawiki, Apr 2014.

[39]

Fergal Reid and Martin Harrigan. An analysis of anonymity in the

bitcoin system. Springer, 2013.

[40]

Jesus Rodriguez. 10 patterns of centralized crypto exchanges explained

using machine learning and data visualizations.

https://medium.c

om/intotheblock/10-patterns-of-centralized-crypto-exc

hanges-explained-using-machine-learning-and-data-b38

6d913832, Oct 2019.

[41]

Jack Schickler. Tornado cash dev facing dutch charges

to question chainalysis data alleging criminal links.

https://www.coindesk.com/policy/2023/05/24/tornado-cash-dev-

facing-dutch-charges-to-question-chainalysis-data-alleging-criminal-

links/, May 2023.

[42]

Kyle Soska and Nicolas Christin. Measuring the longitudinal evolution

of the online anonymous marketplace ecosystem. In 24th USENIX

security symposium (USENIX security 15), pages 33–48, 2015.

1372 34th USENIX Security Symposium USENIX Association

[43]

Rainer Stütz, Johann Stockinger, Pedro Moreno-Sanchez, Bernhard

Haslhofer, and Matteo Maffei. Adoption and actual privacy of decen-

tralized coinjoin implementations in bitcoin. In Proceedings of the

4th ACM Conference on Advances in Financial Technologies, pages

254–267, 2022.

[44]

Jacob Swambo, Spencer Hommel, Bob McElrath, and Bryan Bishop.

Bitcoin covenants: Three ways to control the future. arXiv preprint

arXiv:2006.16714, 2020.

[45]

Jochem van de Laarschot and Rolf van Wegberg. Risky business?

investigating the security practices of vendors on an online anonymous

market using ground-truth data. In USENIX Security Symposium, pages

4079–4095, 2021.

[46]

Rolf van Wegberg, Samaneh Tajalizadehkhoob, Kyle Soska, Ugur

Akyazi, Carlos Hernandez Ganan, Bram Klievink, Nicolas Christin,

and Michel Van Eeten. Plug and prey? measuring the commoditization

of cybercrime via online anonymous markets. In 27th USENIX security

symposium (USENIX security 18), pages 1009–1026, 2018.

[47]

Rolf van Wegberg and Thijmen Verburgh. Lost in the dream? measuring

the effects of operation bayonet on vendors migrating to dream market.

In Proceedings of the Evolution of the Darknet Workshop, volume 9,

2018.

A. Interview Protocol

The interviews were conducted using the following interview

protocol. These questions were asked in a semi-structured

setting with the interviewee, so the order in which they were

asked might vary between participants, depending on the di-

rection the conversation went.

• Background interviewee

–What is your role during an investigation?

–What crypto-related training did you follow?

–

How do you keep your knowledge on tracing up to

date?

• Tracing

–

What tools do you use for tracing (i.e., commercial

and non-commercial, block explorers, etc.)?

–

What is the starting point based on which you start

tracing, and what is the goal / ﬁnal product?

–Can you describe how you trace money ﬂows?

–Do you know the source of information of attribu-

tion of the tools you use?

–

Did you ever look at your own wallets/transactions

in the tools? And what did you ﬁnd?

–

Have you ever checked or revisited your analysis

after a seizure? And what did you ﬁnd?

–

How conﬁdent do you feel about the accuracy of

the tools you use? / What are your experiences

with the accuracy of the tools you use? (separated

per category of service. Asked about mixing ser-

vices, exchanges, ransomware, and darknet market

clusters)

•

Preliminary results (we showed an early version of Fig-

ure 1, showing one service at a time, starting with Best-

Mixer)

–

Firstly: what do you expect the results to be? (un-

derestimation/overestimation and by how much)

–Then: showed the results

–

Per seized service: what do you get from these

results?

–

How does this inﬂuence your day-to-day work

when tracing?

B. Interview Codebook

Code Group Code

Interviewee Description: This category

Characteristics captures the interviewee’s role,

tracing background, and approach

to staying, updated on

cryptocurrency tracing, developments.

It reﬂects their level of experience

and proactive engagement with

crypto-related tasks.

Role Detective, Analyst

Crypto Training Self-taught, Vendor-speciﬁc training,

External training

Crypto Updates News, Podcasts, Colleagues

Proactiveness Self-motivated, Organization-mandated

Tracing Tools and Description: This category includes the

Techniques types of tools (both commercial and

public) and techniques used by

interviewees in tracing cryptocurrency

transactions. It reﬂects the resources

available to them and their tracing

methodologies.

Commercial Tracing Tools Chainalysis, TRM

Public Tools Block explorers, Breadcrumbs

Tracing Techniques Transaction-based analysis,

Relationship-based analysis

Investigation Goals and Description: This section captures the

Approaches objectives and methodologies guiding

the interviewees’ investigations, as well

as their awareness of the data sources

that support attribution in tracing tools.

Investigation Goals Identiﬁcation, Turnover analysis

Knowledge of Attribution OSINT, Self-attribution of tools,

Sources Customer data

Revisiting Analysis Always, Sometimes, Never

USENIX Association 34th USENIX Security Symposium 1373

Personal Transactions Description: This section records

whether the interviewees have checked

their own transactions using tracing tools,

and if so, whether these were commercial

or public transactions.

Use of Own Transactions Yes, commercial; Yes, public; No

Implication of Description: This category explores how

Results tracing results inﬂuence the interviewees’

daily work, including changes in workﬂow,

increased oversight, or relocation of

resources based on tracing insights

Inﬂuence on Day-to-Day Workﬂow adjustment, Increased oversight

Work Resource allocation

C. Code Saturation

123456

Interview number (in order)

New codes

1374 34th USENIX Security Symposium USENIX Association

3 views·19 pages

Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF Free Download

Ghost Clusters: Evaluating Attribution of Illicit Services through Cryptocurrency Tracing PDF free Download. Think more deeply and widely.

Uploaded by Jared S. Smith on 2/24/2026

/19

100%