
physicsworld.com
Physics World October 2013 19
News & Analysis
experiments described.
While this vision has yet to be fully
realized, a number of organizations
are trying to bring it about (see box).
These include the commercial com-
pany Figshare set up by Mark Hah-
nel, who says he became frustrated
at not being able to publish all of his
research data while doing a PhD in
stem-cell biology at Imperial Col-
lege in London. He envisaged shar-
ing his data by breaking them down
into their constituent parts, such as
single graphs or figures, for research,
he says, “where the results were null,
or didn’t fit into a larger publishable
story for whatever reason”. Figshare
provides a public, permanently
available repository for individual
researchers as well as universities
and publishers to share their data
with the wider world – with annual
institutional licences covering the
costs of the free service provided to
individuals. Having been in opera-
tion since 2012, Figshare currently
hosts around 1 million publicly avail-
able data units, says Hahnel.
Going public
While services such as Figshare sim-
plify the process of uploading data to
the Internet, researchers will in many
cases still have much to do in making
the fruits of their labour available to
others. As the Royal Society report
points out, there is a big difference
between simple disclosure of data
and what it calls “intelligent open-
ness”. Much of that difference lies
in providing the “metadata” that
allow others to interpret the out-
put of specific experiments. These
metadata include basic details such
as the name of the person who cre-
ated the research data, when those
data were created and who paid
for the research, as well as more
subtle information such as how the
data were acquired, how they were
treated and analysed, and how they
should be used.
One field that is at the forefront
of open data is astronomy, with
the Sloan Digital Sky Survey, for
example, having provided images
of hundreds of thousands of galax-
ies online. However, other areas of
physics are less well suited to public
scrutiny. Particle physics involves
sharing data in large networks of dis-
tributed computers. Those data are
not at all user-friendly either, since
their interpretation requires the
results of complex computer models
that characterize the efficiency of the
particle detectors.
“Unlike astronomy, which is acces-
sible to everyone, here the metadata
come in the form of a dirty great
simulation,” says Tony Hey, who
trained in particle physics and is now
in charge of Microsoft’s collabo-
ration with universities and other
research organizations.
Indeed, Ginsparg thinks physicists
are likely to find it tough-going to
provide open data. In addition to the
time needed to make data available
and understandable, many research-
ers will probably fear being scooped
if they release their data too early.
Ginsparg points out that the arXiv
server already permits any kind
of data to be uploaded alongside
research papers, but that there has
been relatively little demand for this
service so far. And then there is the
question of privacy. The possibility,
he says, that ostensibly anonymized
data actually contain systematic pat-
terns that reveal subjects’ identity
“will give researchers pause”.
Making data pay
For many, the key to stimulating
open data is to put suitable rewards in
place. Alex Wade, director of schol-
arly communication at Microsoft
and a contributor to a recent report
by information providers Thomson
Reuters on open data, points out
that many decisions regarding uni-
versity hiring and promotion and
the allocation of grants are based on
researchers’ records of publication
in high-profile journals. He would
like to see such decisions also being
made on the basis of data dissemina-
tion, by recording and recognizing
the number of times specific data
are reused by other researchers.
“It would be progress to see a more
diverse set of research outputs and
metrics considered in measuring
scholarly impact,” he says. “Data
must be recognized as contributing
to the body of scientific knowledge.”
However, such credit is likely to
become meaningful to researchers
only if it results in tangible benefits.
Such benefits will be discussed as
part of a “road map” on research
data that the League of European
Research Universities is due to
release by around the end of the year,
according to Paul Ayris, director of
library services at University Col-
lege London, who adds that the road
map will recommend that research-
ers who share data should get career
recognition. “I can’t say what criteria
academic appointment committees
will use in the future,” he adds, “but
in my view data sharing will come
to be seen as a mark of best practice
within the next five to 10 years.”
However, it remains to be seen
just how keen universities are on
open data. Boulton says that in the
UK, university vice-chancellors
“see more costs than benefits” and
are also worried that industry might
not like the idea of collaborative data
being made public. Boulton, though,
believes that sharing data should be
seen as part of the normal scientific
process. “It is a false dichotomy to
say that you either do science or you
handle the data,” he says. “Very sim-
ply what we want is the greatest sci-
entific bang per buck.”
Data must be
recognized as
contributing
to the body
of scientific
knowledge
A number of organizations are providing services to make
research data publicly available and usable.
Figshare (http://figshare.com) is a general-purpose online
repository where individual scientists and institutions can store
and share research data, and make these data citeable.
Dryad (http://datadryad.org) shares many of the features of
Figshare, accepting a wide range of data formats and allowing
data to be cited.
DataCite (http://datacite.org) helps researchers find and cite
datasets, and facilitates links between research articles and
underlying data.
Labarchives (http://labarchives.com) provides electronic
lab notebooks, allowing researchers to organize and preserve
their data, view lab data remotely and publish data to specific
individuals or to the public.
Scientific Data (www.nature.com/scientificdata), to be
launched in spring 2014, will contain articles that describe
experimental and observational data sets, allowing researchers
to receive credit for publicly available data.
Opening up data