• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Data Requirements

Page history last edited by Karen Coyle 14 years, 1 month ago

Requirements for a Data Format


Requirements (cooked)

Treat this section like a document, with full sentences and readable construction.

Macro level requirements -- Bibliographic Data


  • A mechanism for making public the data elements that can be used in bibliographic description (e.g. RDF, OWL, or other means)
  • A way to define application profiles using data elements maintained by the library community, and to extend the application profile to include any suitable data elements from any community
  • A decision-making process that will allow the community to add to its metadata scheme
  • Guidance rules (input rules) for recording bibliographic information (e.g., AACR2)
  • Systems that facilitate data creation (e.g., the cataloging functionality of an ILS)
  • A shared pool of bibliographic data and controlled vocabularies for use in catalog creation
  • The ability to insert and update elements without doing a full record replace
  • Versioning of bibliographic data (perhaps wiki-like) without losing earlier edits
  • A way to share data between systems
  • Standard formats for the storage and manipulation of bibliographic information


Macro level requirements -- Systems Functional Data

The bibliographic record has been the core of library systems, but it is not the only data that those systems use. We need to support the machine-readable and machine-processable data that supports and integrates these functions:

  • Bibliographic description
  • Name and subject (and other?) authorities
  • Discovery (subject and known item retrieval)
  • Holdings (including virtual holdings, e.g. what the library can provide access to)
  • Location (physical or virtual)
  • Acquisitions (purchasing and receipt)
  • User authentication
  • Circulation



Requirements (raw)

Add any thoughts here, raw, unedited. Use these to fill in the cooked section.



  • Data elements and metadata record must be designed to be re-usable. Coding should be at a detailed level of granularity; order of elements should not be fixed, but there must be a way to indicate a preferred order for a given application (which might be outside of the record, e.g. a profile).
  • ISBD: should it be supported?
  • FR1: language neutrality, i.e. the textual elements could be expressed in several languages, and the language of en element could be detected automatically;
  • FR2: traceability of changes, i.e. the modifications could be tracked, dated and attributed (thus, reversed);
  • FR3: opinion neutrality, i.e. different opinions could coexist in the metadata, that is the elements could have alternative values, with clearly assigned intellectual responsibilities.
  • we should be thinking less of the format and more about the data elements and their relationships so that our data could be expressed in whatever physical format is desired.
  • Support FRBR and RDA
  • Usability. To what extent is usability a prime requirement, or should we accept a complex format that requires training for professional use? (This is in part the user v. librarian issue.)
  • On the usability side: the data creation starts with a human being, and thus the 'transform' from the conceptual/linguistic into whatever type of notation to be carried forward has to be: __learnable__, __memorable__, __recognizable__. There is a danger present in this exercise if an unreasonable burden will be placed on the human during the data description and creation stage, in order to make it easier for the machine.
  • Support for discovery. This brings up the whole question of defining the data format -- is it just bibliographic description? To what extent should it include discovery tools?
  • Linking to other records, other web resources, web services.


  • Macro-level data types that are needed:


    • bibliographic description


    • Name and subject (and other?) authorities


    • discovery (subject and known item retrieval)


    • holdings (including virtual holdings, e.g. what the library can provide access to)


    • location (physical or virtual)


    • acquisitions (purchasing and receipt)


    • user authentication


    • services


  • Support the description of hierarchical structure (e.g., section / chapter / subchapter). EAD can show these relationships; MARC cannot.
    • Support whole/part linking.
    • Allow for a crosswalk between a journal and where it is indexed.
  • MARC is concise -- XML is not. Does this matter? Can a solution be found?
  • Ability to support MARC in more modern database practices. Semantic marks should not be stored in data fields (as MARC stores a semantic "/" in the 245). Murder MARC or not, we'll need it for its necessary short term role in for data transfer in our "legacy systems" environment.
  • Data normalization. Not all MARC fields are fully normalized. For example, the 773 subfield g has volume, issue, and pagination information combined together with whatever punctuation or abbreviations the database developers deem appropriate. Parsing that information out in order to create OpenURLs or to save the record into a citation style proves very difficult. The various author fields have the same problem, combining first and last name together.
    • Extensible. Arbitrary limits on expandability (e.g., the 36 subfields allowed by MARC) should be avoided whenever possible.
    • Portable. We should not presume that the data format that replaces MARC will be the last data format ever chosen, and should design the format so that the data stored in it can be losslessly migrated to the next format.
  • there is ambiguity confusion between many fields. There are, of course, common fields like the 245 and 100 that are well known and more-or-less consistently used. But when you try to determine something like the format or the genre of the item (e.g., book, article, book review, dissertation), things get messy quickly.
  • Unbundle the intellectual content of the work that is being described (either using AACR, RDA, FRBR, whatever) from its container. Identify the virtual equivalent(s) of the physical description e.g., 300 field, and can those be embodied (and expanded as necessary) in a 'module' that can plug in to the intellectual content module (for lack of more refined terminology).
    • There should be a consistent core set of descriptive fields for books, articles, blogs, web sites, etc. so that a single interface can be used to browse them.
  • The "format for the storage and manipulation of bibliographic information" needs to be easily usable and processable outside of the library world.
  • A problem we face if we want to transition away from the MARC format is that MARC has become more than just a format -- it has taken on the roles of a data schema and a cataloging code. This problem is worsened because discussions about MARC sometimes conflate its various roles.


Comments (0)

You don't have permission to comment on this page.