
However, with current storage and network bandwidth capacities at magnitudes
greater than they were even a few years ago this is generally not a major concern for
the types of systems that would be dealing with PREMIS or similar metadata.
However, in some situations this might still be a factor, such as the need to transmit
millions of PREMIS Events over the network on a regular basis, for example in a
continuous checksum validation scenario. In this kind of scenario using the most
compact serialization format could be very important. An example of a compact
serialization format for PREMIS is serializing PREMIS XML using the
Efficient XML Interchange (EXI) Format 1.0 [5].
Another factor to consider is processing efficiency. This can include not just
efficiency of the serialization process itself, but also how efficiently the serialized
data may be queried, accessed, and manipulated in the underlying storage medium.
Performance considerations can influence decisions such as what type of database
to choose, such as a traditional SQL relational database, a SPARQL RDF database,
a native XML database, or some hybrid approach, or in some cases just storing
PREMIS XML files to a disk using a standardized file naming scheme.
There are also human factors to be considered. For example, the original design
goals for XML [1] included “human legible and reasonably clear,”“easy to create,”
and “easy to write programs which process XML.”These are among the reasons
that XML has become a popular serialization format for various metadata schemas.
(It is interesting to note that compactness was explicitly not a goal for XML.)
However, the human factor should extend not just to the creators and maintainers of
the data themselves, but also to the developers and maintainers of systems that must
manipulate the serialized data. XML strikes a good balance between human read-
ability and editing and computer processing, especially with its large suite of
developer and editing tools. However, serialization formats such as JSON [6] and
YAML [7] have become popular with software developers because they make it
easier to manipulate the data in various programming languages. There are
numerous conventions and tools that can convert between these formats, for
example BadgerFish [8] which has some support in different programming APIs
and tools [9] as a convention for translating XML to JSON. There are also other
JSON/XML conversion tools with different conventions, both open source and
commercial [10–12].
Because of their standard query languages, APIs and performance characteris-
tics, various databases are also used to serialize PREMIS data. This could include
not just relational databases, but also RDF SPARQL triple stores, or index and
search engines such as Apache SolR/Lucene [13]. The Resource Description
Framework (RDF) model and one of its serialization formats, such as Turtle, N3,
JSON-LD, or RDF/XML [2], along with an RDF SPARQL [14] triple store, would
be a good choice when there is a requirement to easily merge different datasets even
if they do not share a common underlying schema.
Finally, the structures inherent in the data model itself can often lead toward a
specific serialization format. For example, hierarchical data models are well suited
to serialization as XML. Well-defined relational models lend themselves to SQL
databases. Models consisting of nodes connected by directed edges (directed
162 T. Habing