Back to MARC Elements
Bibliographic Fields (100-899)
Fields by Frequency
As a way to prioritize the analysis, here are the top fields by frequency, according to the UT (Moen) statistical analysis:
In 25%-100% of records
|
Next 15%
|
|
100
245
260
300
500
504
650
651
700
710
880
|
110
246
250
440
490
600
740
|
Issues
The bibliographic fields are particularly difficult to render as data. Most fields have numerous subfields as well as indicator values. The fields often correspond to display segments rather than data elements, although in some cases those two are the same.
Fields and subfields
We take for granted that MARC is made up of fields and subfields, but the exact relationships between fields and subfields isn't always clear. In same cases the field provides a particular context, as in the 1XX which designates the subfields that follow as part of the main entry, and in addition codes the entry as either a personal name, corporate name, or conference. (The indicator can change a personal name to a family name.) Looking at it from a linked data point of view, the field provides a relationship between the content and the primary focus of the record. It also may identify the type of data in the field (personal name).
Administrative data
Another analysis that needs to be done is to determine the focus of each data element in the record. There are at least two foci that I see right away:
- The primary resource (the book, the DVD, the map, etc.)
- The record (administrative data)
It isn't clear to me if the linking fields (77X) fit into category 1 or if they are a different category altogether.
Alternate Graphical Representation
These are the 880 fields that represent an alternate character representation from another field in the record. For example:
| 100 |
1_ |6 880-01 |a Nagradov, I. S. |q (Ilʹi︠a︡ Sergeevich) |
|---|
| 880 |
1_ |6 100-01/(N |a Наградов, И. С. |q (Илья Сергеевич) |
|---|
It isn't clear to me what to do with these, in particular how to link them to the particular field that they should be connected to.
Level of Detail
To what extent it is valuable to retain the exact level of detail of the MARC record? For each field it will be necessary to ask if information separated into subfields is useful as individual data elements. Some notes, for example, have subfielding that may not result in separately usable statements:
506 1#$aRestricted: Material extremely fragile;$cAccess by appointment only.
Redundancy
The same data can appear in multiple places in the MARC record; there are numerous fields with title subfields.
Ambiguous Coding
Coding of the MARC fields and subfields is often not at a sufficient level of granularity to eliminate ambiguity. Although some ambiguity is to be expected, this has a particularly detrimental effect when there is not enough clarity to support desired functionality.
The uniform title is an interesting example of what programmers might call "overloading." Although identically coded, these 240 fields have totally different meanings:
24010 $a Selections
-- For an item ..."consisting of three or more works in various forms..." The title of the work is NOT "Selections"
24010 $a Pendolo di Foucault. $l English
-- For a translation of a Work. "Pendolo di Foucault" IS the title of the work (in the FRBR sense of Work) that was translated. The subfield $l tells you what it was translated to.
24010 $a Concertos, $m harpsichord, string orchestra, $n BWV 1052, $r D minor
-- I'm not sure how to describe this, except that it is a coded description of music that has little or nothing to do with the actual title of the thing begin described. This is to music as the hierarchical place name (752) is to newspapers. It's great data, and undoubtedly very useful, but it really needs its own data element.
It gets even worse when you start looking at 700 $t's. The music people always want to have an index that includes the 100/240 and the 700 $a $t. Unfortunately, not all 700 $t's are equivalent to a 240, not even in music records. So you either throw every 700 with a $t into a uniform title field (and most of the time they won't look like uniform titles, so the value of that diminishes), or you do a title index that includes all of the titles, and the music folks are unhappy that they can't search only on THEIR uniform titles.
Clearly, if instead of throwing every kind of title into 700 $t we could have a data element for that valuable, constructed music title, then we could serve music library patrons much better.
Tags
100/110 - Primary agents
The complexity here is that these are not simple descriptive fields but are headings that, in E-R parlance, represent entities with relationships to the primary focus. To put that in clearer terms, if the record describes a particular bibliographic thing, these fields represents other things that have a key creative relationship to the particular bibliographic thing. This is where one records the author or primary creator of a resource.
700/710, et al.
While the 1XX's are fairly complex, the 7XX's in this range are even more so. Where 1XX's represent creators in some sense of that meaning, the "added entries" in the 7xx range have a variety of roles. Unfortunately these roles are often not explicit in the MARC instance data.
Possible solutions
Things and Strings
A first analysis could separate the variable fields into "things" and "strings." Things are those fields (or portions of fields) that can be represented by an authority-controlled entity. Conceptually, those things could be replaced by an identifier for the authority record. Strings are everything else. Strings themselves can be broken into categories, primarily transcribed text and supplied text.
Data v. Markup
Some of the variable fields could be treated as structured data, such as the structured contents notes field (505 using subfields). Another option is to treat textual fields as text with markup where needed, as in an unstructured contents note that uses ISBD punctuation to differentiate entries, authors and titles. Using markup could speed up the process of translating some of the textual fields that do not have a data equivalent, primarily the notes fields.
Connecting to MARC21 in ISO 2709
There is a need to make clear the connection between elements in MARC21 RDF and the MARC21 elements stored in ISO 2709 format. At the same time, it is not desirable to use the MARC21 2709 field/subfield designators as the identities of MARC21 in RDF because of the unfriendliness of the tag/subfield conventions to non-librarians. For this reason, it seems best to embed a link to MARC21 2709 in the description for each MARC21 in RDF element. One possibility is to use a MARC-centric URI for the MARC21 in 2709 elements:
http://marc21.info/element/506a
To encode this, it may be suitable to use OWL sameAs. However, there is the disadvantage that these URIs do not resolve (at least, not on their own). Ideas about this would be appreciated!
Back to MARC Elements
Comments (0)
You don't have permission to comment on this page.