
applied to a wide range of similarly structured, semi-structured, and unstructured
historical sources. The datasets are open-access and available to other researchers via our
website and in long term repositories.1
This is not the first large-N, data-centric effort to examine aspects of modern
engineering history. In the mining sector, for example, Kathleen Ochs collected career
data on engineering graduates from the Colorado School of Mines, Duncan Money has
built a dataset using membership lists for the leading professional association in the
copper sector, and Marco Bertilorenzi has a dataset of graduates of France’s Saint Etienne
School of Mines. In addition, William Maloney and Felipe Valencia Caicedo have collated
data on twentieth century engineering numbers across countries in the Western
Hemisphere (Maloney & Valencia Caicedo, 2022; Money, 2022; Ochs, 1992). Other
scholars have noted the importance of international networks among professional mining
engineers (and other technical experts). See, for example, Stephen Tuffnell on networks in
the gold mining sector, and David Pretel and Lino Camprubi on global networks of experts
in the history of technology (Pretel & Camprubí, 2018; Tuffnell, 2018). More recently,
Bamboo Ren et al. have been able to use student and university employees records to
assess trends in Chinese Academe in the first half of the twentieth century (Campbell &
Lee, 2020; Ren et al., 2020). The sources and database construction methods described in
this article represent a significant departure from these efforts in (a) the scale of the data
collection, (b) in the techniques used to collect, code, clean, and validate the data, and in
(c) our ability to link data across historical sources.
Our datasets have several unique features. First, we utilize sources produced by
different types of organizations, with different objectives, and exhibiting very different
internal characteristics. These include structured, semi-structured, and unstructured
qualitative historical evidence. Second, we present methods used to extract, code, clean,
and validate data across these different types of sources. Our methods can be utilized by
researchers working with similar types of historical evidence on other topics. Third and
most importantly, we present our methods for linking individuals across time, between
organizations, and across space. Part of the methods we describe below involve the
simplification and standardization of names of individuals and firms to maximize
opportunities for cross-source and cross-temporal linking, and algorithms of
disambiguation to avoid false links. These linkages reveal a degree of mobility and
relationship that is typically invisible (or at best anecdotal) in conventional studies based
on organization-centered archival data. Creating and then linking large datasets opens
substantial new opportunities for research, interpretation, and the identification of large-
1 The datasets can be located and downloaded through the Notre Dame digital repository, CurateND, and
cited as follows: Israel G. Solares and Edward Beatty (2025). Engineering History Project Dataset (Version
v.1) [Dataset]. CurateND. https://doi.org/10.7274/30108082.