| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

DataFormatIssues

This version was saved 17 years, 5 months ago View current version     Page history
Saved by PBworks
on November 17, 2006 at 9:03:05 am
 

Data Format Issues and Ideas

Issues (cooked)


Issues (raw)

  • Relationship to FRBR and RDA. Is this a time of change that requires a new format, or should these be treated as separate needs?
  • Can we tackle just MARC bibliographic, or do we also need to include at least Authorities and Holdings in our analysis? Comment from Martha Yee: The FRBR entities work, expression, person, corporate body, topic, etc. are represented by authority records, not bibliographic records, so I would say it would be essential to include Authorities. At the UCLA Film & Television Archive, we have noticed that the only MARC 21 hierarchy supported by current systems is that created by authority record linked to bibliographic record linked to holdings record, so we have used the authority record to represent the work, the bibliographic record to represent the expression and the holdings record to represent the manifestation; that would argue for considering the holdings record, as well.
  • MARC causes library data to be marginalized (but MODS, a friendlier XML format, hasn't had much more success at crossing over to other fields)
  • How do you express the current table of contents RSS feed for a journal title from CiteULike?
  • There are a lot of advantages to MARC. We have a lot of data in MARC. It would be expensive to move whole hog off of it. Maybe we keep MARC for some bibliographic data.
  • MARC has many fields and data elements (fixed) that can no longer be expanded, so new data elements cannot be added.
  • MARC Bibliographic is not just bibliographic -- there is info for ordering, URLs to related web items, holdings info...
  • What fixed fields are used by systems? What need is there to carry this information into a new data format?
  • social tagging -- can it work? (kc: I'm copying this to the catalog discussion)
  • Browsing -- a requirement of the data format, or a system feature, or not needed (because it would be better to use topic maps)? (kc: Ditto, copying to catalog discussion). Comment from Martha Yee: Could we define browsing here please? Does it refer to a search of headings as opposed to a search of bibliographic records? If so, I would say it is essential, since it is the heading that represents the FRBR entity of interest to the user, not the bibliographic record (currently defined as a particular manifestation of a particular expression of a particular work). Does it refer to a left-to-right match of a heading? If so, I would say systems would provide better service if they used a keyword in heading search as the default FRBR entity search, rather than the current left-to-right match which is the only possibility offered by most current systems, and which requires the users to know entry terms.
  • Need to list problem data elements, i.e. dates (some ambiguity), genres (see MODS)
  • Examine the results of the data analysis done by Moen for 007 field in particular. Suspect that few vendors do anything with many of the elements defined in the 007. This could be an interesting place to experiment with: if there is very little legacy data there in the first place (check Moen's results); and second, for the legacy data that IS there, if no one is using the majority of the data elements (i.e., end-user retrieving or browsing in a meaningful way, not simply encoding on the creator side); then, maybe it would be fertile ground to consider 'lifting' (liberating) this part of the construct and remodeling this portion elsewhere.
  • 007 needs attention, but include staff reporting (e.g. for preservation, whatever) as a meaningful use of the data.
  • Hierarchy... as noted, MARC supports hierarchical description very little and poorly, and that's been a constraint. Given the scale and complexity with which we communicate records among systems, better support for linked or hierarchical descriptions will require sophistication in handling inheritance, identifiers, and update dates. Maintaining metadata that flows between hierarchical and flat environments is ugly. (R. Wendler) Comment from Martha Yee: Is it possible that the major constraint on demonstrating hierarchy here is not MARC, but rather the requirement that we be able to communicate records among thousands of different systems? Will we need to do that into the indefinite future (see my article called "One catalog or no catalog")? And another question for people who know more about systems than I do: What exactly is it that prevents current systems from making the cross reference from FBI on the authority record for 'United States. Federal Bureau of Investigation' available to users who search on subdivisions of the FBI? Could current systems, if properly programmed, recognize that any cross reference that refers to the parent organization should refer also to any subdivision of the parent organization? Or are there underlying hardware or software constraints outside of MARC that make this currently impossible?
  • How well can the data format support resource discovery?

 

 

Problems with MARC21 (K Coyle, lifted from a previous document with more explanation)

  1. Limitations on the size of records that can be created with Z39.2: a maximum of 9999 characters per field and 99999 characters per record. The latter effectively limits the number of fields that a record can contain.
  2. Inherent limitations in the MARC implementation of Z39.2: a maximum of 26 distinct content subfields and 10 control subfields can be defined per tag. (Note, numeric subfields have been designated as having a special function.)
  3. A large number of data elements with some degree of redundancy (X00 fields, X10 fields, title fields and subfields, etc.) Comment from Martha Yee: Does redundancy refer here to the current necessity to record both a transcribed form of a name (how it appeared on the document cataloged) and a normalized form of a name? If so, I don't think the need to record a transcribed form goes away until every document ever created by humanity is digitized and linked to its description; that may never happen, and as long as we are describing things that are not digital, an important part of determining the names by which entities are commonly known is the recording of the way they actually appear on every document we collect...
  4. Inconsistency between the treatment of same or similar data elements across fields.
  5. Fixed fields have values that are actually embedded in the standard. To add a new value means you have to modify the standard itself. They should all be external authoritative lists (if they should exist at all)
  6. Fixed fields that should be parallel to textual fields are a) located separately from those fields b) may not have the same values, either because of input problems or because of limitations in the value list.
  7. Variable fields that extend fixed fields (i.e. 041 extending language code in 008) because of lack of flexibility in the fixed fields. These data elements should be brought together.
  8. The use of defaults in fixed fields, which therefore convey little information because the fixed position must carry a value, even if that value is blank.
  9. Record linking in MARC21 is awkward to use and is not implemented by many systems.
  10. A mixture of logical levels (from the work to the item level) in a single record with no structural differentiation. Comment from Martha Yee: This could be pretty tricky to tease out. Consider the fact that the title, for example, could be the title of the work (if it coincides with the uniform title), the title of the expression (but only after at least one different expression has been published with a different title), AND the title of the manifestation (but only after at least one different manifestation (identical underlying content) has been published with a different title on the chief source of information); in other words, which level(s) the title "belongs to" will change over time as more and more manifestations and expressions are published, and as this change occurs, the title will be associated with more and more of the FRBR levels (not just one).

Comments (0)

You don't have permission to comment on this page.