Serialization of PREMIS PDF Free Download

1 / 2
0 views2 pages

Serialization of PREMIS PDF Free Download

Serialization of PREMIS PDF free Download. Think more deeply and widely.

Chapter 13
Serialization of PREMIS
Thomas Habing
13.1 Introduction
Serialization typically describes how a data structure or data model is converted into
formatted bits that can be stored in some physical medium, such as disk, tape, or
computer memory, or transmitted across a network. The goal is to be able to
recreate a semantically equivalent data structure or data model by reading the
serialized, formatted bits from the storage media or from the network. This chapter
discusses the common serialization options XML [1], Linked data [2], and rela-
tional databases [3] and applies them to possible implementations of the PREMIS
Data Dictionary [4].
In some situations, a serialization process may include transformations, possibly,
for example serialization into a similar but not equivalent data model. However,
formally this would be considered as two different processes, a transformation or
mapping and then a serialization or vice versa, but often the two operations can
become conated in actual data management systems. This chapter uses a some-
what less formal denition of serialization which may include transformations as
well as marshaling to or from some storage medium or network.
13.2 Implementation Options
There are a number of factors to consider when weighing serialization options.
Historically the compactness of the serialization format was an important consid-
eration, reecting the need to optimize scarce storage and network bandwidth.
T. Habing (&)
Library, University of Illinois at Urbana-Champaign, 1408 W Gregory Dr, Urbana,
IL 61801, USA
e-mail: thabing@illinois.edu
©Springer International Publishing Switzerland 2016
A. Dappert et al. (eds.), Digital Preservation Metadata for Practitioners,
DOI 10.1007/978-3-319-43763-7_13
161
However, with current storage and network bandwidth capacities at magnitudes
greater than they were even a few years ago this is generally not a major concern for
the types of systems that would be dealing with PREMIS or similar metadata.
However, in some situations this might still be a factor, such as the need to transmit
millions of PREMIS Events over the network on a regular basis, for example in a
continuous checksum validation scenario. In this kind of scenario using the most
compact serialization format could be very important. An example of a compact
serialization format for PREMIS is serializing PREMIS XML using the
Efcient XML Interchange (EXI) Format 1.0 [5].
Another factor to consider is processing efciency. This can include not just
efciency of the serialization process itself, but also how efciently the serialized
data may be queried, accessed, and manipulated in the underlying storage medium.
Performance considerations can inuence decisions such as what type of database
to choose, such as a traditional SQL relational database, a SPARQL RDF database,
a native XML database, or some hybrid approach, or in some cases just storing
PREMIS XML les to a disk using a standardized le naming scheme.
There are also human factors to be considered. For example, the original design
goals for XML [1] included human legible and reasonably clear,”“easy to create,
and easy to write programs which process XML.These are among the reasons
that XML has become a popular serialization format for various metadata schemas.
(It is interesting to note that compactness was explicitly not a goal for XML.)
However, the human factor should extend not just to the creators and maintainers of
the data themselves, but also to the developers and maintainers of systems that must
manipulate the serialized data. XML strikes a good balance between human read-
ability and editing and computer processing, especially with its large suite of
developer and editing tools. However, serialization formats such as JSON [6] and
YAML [7] have become popular with software developers because they make it
easier to manipulate the data in various programming languages. There are
numerous conventions and tools that can convert between these formats, for
example BadgerFish [8] which has some support in different programming APIs
and tools [9] as a convention for translating XML to JSON. There are also other
JSON/XML conversion tools with different conventions, both open source and
commercial [1012].
Because of their standard query languages, APIs and performance characteris-
tics, various databases are also used to serialize PREMIS data. This could include
not just relational databases, but also RDF SPARQL triple stores, or index and
search engines such as Apache SolR/Lucene [13]. The Resource Description
Framework (RDF) model and one of its serialization formats, such as Turtle, N3,
JSON-LD, or RDF/XML [2], along with an RDF SPARQL [14] triple store, would
be a good choice when there is a requirement to easily merge different datasets even
if they do not share a common underlying schema.
Finally, the structures inherent in the data model itself can often lead toward a
specic serialization format. For example, hierarchical data models are well suited
to serialization as XML. Well-dened relational models lend themselves to SQL
databases. Models consisting of nodes connected by directed edges (directed
162 T. Habing