DataFormatIssues


Data Format Issues and Ideas


Issues (cooked)

 

Problems with MARC21

Record format limitations

  1. Limitations on the size of records that can be created with Z39.2: a maximum of 9999 characters per field and 99999 characters per record. The latter effectively limits the number of fields that a record can contain.
  2. Inherent limitations in the MARC implementation of Z39.2: a maximum of 26 distinct content subfields and 10 control subfields can be defined per tag. (Note, numeric subfields have been designated as having a special function.)
  3. Limited hierarchical levels: tag and subfield.
  4. Two indicator positions means that each field can have only two attributes.

Redundancy of data elements

  1. A large number of data elements with some degree of redundancy (X00 fields, X10 fields, title fields and subfields, etc.) Comment from Martha Yee: moved to comments.
  2. Inconsistency between the treatment of same or similar data elements across fields (e.g. author and title information in 77x linking fields does not have the same subfields as those data elements in other fields).

Fixed field data elements

  1. Fixed fields have values that are actually embedded in the standard. To add a new value means you have to modify the standard itself. They should all be external authoritative lists (if they should exist at all)
  2. Fixed fields that should be parallel to textual fields are a) located separately from those fields b) may not have the same values, either because of input problems or because of limitations in the value list.
  3. Variable fields that extend fixed fields (i.e. 041 extending language code in 008) because of lack of flexibility in the fixed fields. These data elements should be brought together.
  4. The use of defaults in fixed fields, which therefore convey little information because the fixed position must carry a value, even if that value is blank.

Levels and linking

  1. Record linking in MARC21 is awkward to use and is not implemented by many systems.
  2. A mixture of logical levels (from the work to the item level) in a single record with no structural differentiation. Comment from Martha Yee: moved to comments.
  3. No way to create a multi-level record for works within works.

 

The Field

The basis of any new metadata will be the definition of a field. The MARC21 field has four elements:

 

  1. Field tag
  2. Two indicator positions; the indicators have general information about the content of the field - how to display, what is the authority over the data, qualifiers for the data (type of heading, type of scale)
  3. General subfields (a-z); these carry the main data elements of description
  4. Control subfields (1-8); these perform functions within the record - linking between fields, identification of authority lists, role codes

 

Both indicators and control subfields perform multiple roles in relation to the field data, and some data (authority lists, as an example) can be coded either as indicators or as control subfields. The elements of the field, when sorted out, are:

 

  1. Field identifier (MARC tag)
  2. Data qualifiers (type, level)
  3. Data source (authority list)
  4. Display rules (display constants, note controllers)
  5. Data subfields
  6. Field linking (between fields in the same record)

 

There is possibly also a need for metadata about the field itself:

 

  1. Field identifier
  2. Creation time and date
  3. Cataloging rules that govern the content, with Version

 

This metadata will make it possible to update individual fields rather than having all updates be a full record update, as they are in MARC, and to mix data from different sources or sets of rules, if desired.

 

In addition, there is a control subfield ($7) that provides a complex array of information about a linking entry field. As we have seen with MODS, this function is due to the limitations of the MARC format and can be overcome in other formats by allowing more levels of hierarchy in the data itself.

 

Ideas

Extensible controlled vocabulary lists

One of the problems that occurs in the current library metadata is that there are authority lists that cannot be easily extended. For the lists in the MARC format, actual changes to the standard are needed in order to extend a list (with the exception of the large lists managed by Library of Congress for languages and place of publication). There needs to be an easy way to add terms to a controlled vocabulary list without breaking the standard.

 

There are many different ways that you can develop extensibility for a set of terms. The main thing is that you want the newly minted term to have a clear context (what list does it belong to?), and you want to be able to get people to the definition of the term when they encounter it. In this case, the context is that it is a carrier of information, and it is specifically a new kind of computer carrier. It is also extending an existing list, say, the RDA carrier list.

 

Let's pretend that we have a registry of terms. And let's pretend that the registry has some management mechanism, such as a small group of participants that oversees the various lists in the registry (so it's not total anarchy). Our thumb drive could be added such that:

 

http://authoritylists.info/RDA:carrier:computer_carrier:USB_flash_drive

 

returns this information in a machine-readable format:

 

owner: RDA

list: carrier

sublist: computer carrier

element: USB flash drive

status: provisional

date added: 2007-03-30

description: "USB flash drives are NAND-type flash memory data storage devices integrated with a USB (universal serial bus) interface." (quotes because I took that from wikipedia, but generally the expert adding the term would write a suitable description.)

synonyms: thumb drive, jump drive, flash drive

 

The vocabulary list could be used by anyone who finds it useful. Systems could make use of the registry to support the creation of new records and the reading of existing records. With some periodicity, these systems would check that their lists are up to date (like the automatic update of virus lists in anti-virus software). Such a system could decide that provisional entries would be flagged in some way (maybe they would show up as red on the screen). Or a system receiving a record with a previously unknown item in an authority list or a previously unknown list could quickly grab the description from the registry and use that to provide services, like definitions and synonyms, to its users.

 

 


Issues (raw)