Serialization of PREMIS PDF Free Download

Name: Serialization of PREMIS PDF
Author: _james_conner_

1 / 2

0 views•2 pages

Serialization of PREMIS PDF Free Download

Serialization of PREMIS PDF free Download. Think more deeply and widely.

Chapter 13

Serialization of PREMIS

Thomas Habing

13.1 Introduction

Serialization typically describes how a data structure or data model is converted into

formatted bits that can be stored in some physical medium, such as disk, tape, or

computer memory, or transmitted across a network. The goal is to be able to

recreate a semantically equivalent data structure or data model by reading the

serialized, formatted bits from the storage media or from the network. This chapter

discusses the common serialization options XML [1], Linked data [2], and rela-

tional databases [3] and applies them to possible implementations of the PREMIS

Data Dictionary [4].

In some situations, a serialization process may include transformations, possibly,

for example serialization into a similar but not equivalent data model. However,

formally this would be considered as two different processes, a transformation or

mapping and then a serialization or vice versa, but often the two operations can

become conﬂated in actual data management systems. This chapter uses a some-

what less formal deﬁnition of serialization which may include transformations as

well as marshaling to or from some storage medium or network.

13.2 Implementation Options

There are a number of factors to consider when weighing serialization options.

Historically the compactness of the serialization format was an important consid-

eration, reﬂecting the need to optimize scarce storage and network bandwidth.

T. Habing (&)

Library, University of Illinois at Urbana-Champaign, 1408 W Gregory Dr, Urbana,

IL 61801, USA

e-mail: thabing@illinois.edu

©Springer International Publishing Switzerland 2016

A. Dappert et al. (eds.), Digital Preservation Metadata for Practitioners,

DOI 10.1007/978-3-319-43763-7_13

161

However, with current storage and network bandwidth capacities at magnitudes

greater than they were even a few years ago this is generally not a major concern for

the types of systems that would be dealing with PREMIS or similar metadata.

However, in some situations this might still be a factor, such as the need to transmit

millions of PREMIS Events over the network on a regular basis, for example in a

continuous checksum validation scenario. In this kind of scenario using the most

compact serialization format could be very important. An example of a compact

serialization format for PREMIS is serializing PREMIS XML using the

Efﬁcient XML Interchange (EXI) Format 1.0 [5].

Another factor to consider is processing efﬁciency. This can include not just

efﬁciency of the serialization process itself, but also how efﬁciently the serialized

data may be queried, accessed, and manipulated in the underlying storage medium.

Performance considerations can inﬂuence decisions such as what type of database

to choose, such as a traditional SQL relational database, a SPARQL RDF database,

a native XML database, or some hybrid approach, or in some cases just storing

PREMIS XML ﬁles to a disk using a standardized ﬁle naming scheme.

There are also human factors to be considered. For example, the original design

goals for XML [1] included “human legible and reasonably clear,”“easy to create,”

and “easy to write programs which process XML.”These are among the reasons

that XML has become a popular serialization format for various metadata schemas.

(It is interesting to note that compactness was explicitly not a goal for XML.)

However, the human factor should extend not just to the creators and maintainers of

the data themselves, but also to the developers and maintainers of systems that must

manipulate the serialized data. XML strikes a good balance between human read-

ability and editing and computer processing, especially with its large suite of

developer and editing tools. However, serialization formats such as JSON [6] and

YAML [7] have become popular with software developers because they make it

easier to manipulate the data in various programming languages. There are

numerous conventions and tools that can convert between these formats, for

example BadgerFish [8] which has some support in different programming APIs

and tools [9] as a convention for translating XML to JSON. There are also other

JSON/XML conversion tools with different conventions, both open source and

commercial [10–12].

Because of their standard query languages, APIs and performance characteris-

tics, various databases are also used to serialize PREMIS data. This could include

not just relational databases, but also RDF SPARQL triple stores, or index and

search engines such as Apache SolR/Lucene [13]. The Resource Description

Framework (RDF) model and one of its serialization formats, such as Turtle, N3,

JSON-LD, or RDF/XML [2], along with an RDF SPARQL [14] triple store, would

be a good choice when there is a requirement to easily merge different datasets even

if they do not share a common underlying schema.

Finally, the structures inherent in the data model itself can often lead toward a

speciﬁc serialization format. For example, hierarchical data models are well suited

to serialization as XML. Well-deﬁned relational models lend themselves to SQL

databases. Models consisting of nodes connected by directed edges (directed

162 T. Habing

0 views·2 pages

Serialization of PREMIS PDF Free Download

Serialization of PREMIS PDF free Download. Think more deeply and widely.

Uploaded by _james_conner_ on 4/10/2026

100%