| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Framework

Page history last edited by Steve Casburn 14 years, 4 months ago

Framework for a Bibliographic Future

 

Draft for discussion, by Karen Coyle, Diane Hillmann, Jonathan Rochkind, Paul Weiss

(May 2007 revision)

 

 

How to Comment You must login to comment. After you login you will see the Comments tab on this page. To login you must provide the wiki-wide password. The password for this wiki is the last name of the man who invented the Dewey Decimal System, plus the last two digits of the famous year of the founding of the United States. If this isn't enough of a hint, contact Karen Coyle for the actual password.

 

See other frameworks from other authors

 

 

Introduction

 

Metadata is a generic term for the data that we create about persons, places, things, documents, and anything else about which we wish to communicate or wish to operate on in an electronic environment. Although it is common to hear that "all data is metadata," it is certainly the case that not all metadata is well designed. Good design increases the potential success of a metadata standard.

 

The design components proposed in this model are not new. Similar components are used to some degree in standards such as the OpenURL Framework (Z39.88), the Semantic web, and the Dublin Core Metadata Initiative. However, the theory and practice of designing layered components like this for metadata is a continued work in progress, it is not something that has been already solved. Some aspects of the framework we suggest here are fairly well established as good practices in data modelling (the seperation of data models from guidance), while others are newer and still being investigated and worked on by communities, and are thus subject to more change (the idea of an Abstract Metadata Model).

 

A framework such as this serves many purposes. In particular, we are interested in producing metadata that is both highly extensible and that will promote compatibility between communities and applications that extend the metadata.

 

We propose four components that each serve as a layer in the overall metadata system: an Abstract Metadata Model of basic structures and relationships, a Domain Model that defines the basic structures and relationships of our bibliographic/information resource domain, along with an extensible set of properties, Guidance for application of the properties, and Encoding. An Abstract Metadata Model can underlie one or more Domain Models, and any Domain Model can be expressed using one or more encodings. The Guidance document is a key element that provides both direction to creators but also describes the semantics of the data elements in a human-understandable way. These four components provide a basis for creation of machine-manipulable metadata that has meaning to a community yet can be defined in a rigorous way to communicate clearly to any users of the data.

 

Abstract Metadata Model

 

This is a fairly new idea that an abstract Metadata Model is neccesary. The Dublin Core Abstract Model (DCAM) is a model of _metadata itself_, essentially a non-domain-specific framework for structured metadata. We list this layer first, because it's the 'bottom' foundational layer, but in addition to being the most abstract layer, it is also fairly new and in-formation as a concept, and for both those is reasons in many ways hardest to understand. So the reader should skip down to the other layers if she likes.

 

The DCAM is a way of thinking about the structure of metadata in a very abstract and domain-independent way. For instance, it defines the ideas of 'description sets', 'descriptions', and statements. "The abstract model describes an abstract information structure." (Nilsson et al, n.d.)

 

OpenURL 1.0 is perhaps another example of an Abstract Metadata Model, to the extent that OpenURL 1.0 doesn't tie itself to any particular domain or application, but is intended as a general framework for the structure of metadata in a domain-independent way. Nilsson et al suggest that "RDF Concepts and Abstract Syntax" is another example of what we call here an Abstract Metadata Model.

 

It may not seem obvious why an Abstract Metadata Model is neccesary, the fact that is necessary is a fairly recent 'discovery'. One of the authors found the Nilsson et. al. paper helpful in understanding the neccesity and role of an Abstract Metadata Model, although this draft does not use exactly the same terminology as Nilsson et al. Now, on to slightly less abstract layers of our metadata framework.

 

Domain Model

 

A Domain Model is a model of what the the structures and relationships are in a particular domain or application that the metadata will address. In the library community, the entity-relationship structure provided by FRBR Model is an attempt to define a formal domain model for bibliographic or information resources that are in that domain. It includes basic aspects of the information universe that will eventually be defined by metadata (works, expressions, manifestations, and items, plus the entities such as person and concept that will have a relationship with the primary four). We need to consider carefully how the FRBR model works in the context of other models that may be used for bibliographic data, both at the Abstract Metadata Model level, and other Domain Models. Another example of a domain model would be that of the OpenURL, which defines its universe as a Context Object with the following entities: Referrer, Referent, Referring Entity, Requesting Entity, Requester, Service Type. A Domain Model that results should be independent of any particular encodings of bibliographic metadata, but will provide a structure that all implementations of metadata derived from the model will have in common.

 

Functional Requirements

 

Metadata must serve a purpose. To make metadata useful for a community, that purpose must be made clear. This is generally done through a set of functional requirements that explain what one wishes to accomplish with the metadata. The "Functional Requirements for Bibliographic Records" (FRBR) is a high level set of requirements for the library community. This sets a general direction for library metadata, but additional specific requirements may be needed to provide guidance in the creation of the metadata set.

 

Domain Vocabulary

 

A Domain Vocabulary is the set of terms that will be used in the metadata, often called an 'Element Set', 'Element Vocabulary', or 'Data Dictionary'. These very rigorously and unambiguously define actual properties that will carry values in the data set, as well as the relationships between those properties. Data elements can be defined at any relevant level of granularity. They can have hierarchical relationships between them or non-hierarchical relationships. The Dublin Core Element Set is an example of a set of data elements.

 

In addition to the metadata vocabulary, a domain may define Value Vocabularies for use in its metadata. These are generally finite lists of terms (called "values") from which the input data will be chosen. They are also called "authoritative lists" in some environments. Examples are: a list of languages that will be used in the element "Language of Text"; a list of countries or country codes that will be used in geographic data elements; a list of terms representing types of documents, such as "book," "journal," "article."

 

Guidance

 

Guidance is often desired to aid in the creation or assignment of values to data elements in a consistent way. Guidance may be general or specific, but it usually attempts to address circumstances that users will encounter in the creation of the metadata. Different communities making use of the same data elements may define their own specific best practices that attempt to produce the metadata that is most useful for their purposes, but to support interoperability they must not re-define the elements in order to address those needs. The library community has traditionally received its guidance from cataloging rules (such as AACR) and from practices published as part of the encoding of library data using MARC21. Increasingly, specialized guidance for specific communities has been developed that reflects the differences in materials or approach inherent in their tasks: examples are Cataloging Cultural Objects (CCO) for the museum community and Describing Archives: A Content Standard (DACS) from the archival community.

 

Encoding

 

We can assume that any metadata being created today will be expressed and exchanged in a machine-readable encoding. The primary requirement for metadata encoding is that it must be able to encode the full detail of the semantics and relationships intended by the metadata creators; and it must expand as the metadata schema grows and changes. The same metadata can be encoded in different data formats and still be fully shareable, as long as the encoding is true to the data elements and to the overall structure of the metadata model.

 

Discussion

 

FRBR

 

FRBR's entity-relationship model (as defined in Chapters 3-5 of the FRBR Report ) is a useful, if not complete or even wholly accurate, analysis of our bibliographic universe. The delineation of the four group 1 entities illuminates an important issue of our legacy: we have been putting metadata about different bibliographic entities into single descriptions. As just one example, the FRBR report provides an explanation for the ambiguity of dates in bibliographic records: there are at least four dates of creation that apply to each bibliographic resource--those of its work, expression, manifestation, and item. For many resources all these are the same, so there is no need to delve further, but some resources are more complex, and that complexity has led to confusion about dates used in brief displays and search limits.

 

FRBR, and work by Barbara Tillett, Richard Smiraglia, and others has contributed to an increasingly formalized notion of relationships among bibliographic resources, and between bibliographic resources and associated entities (for instance, FRBR's group 2 entities--persons, corporate bodies--and draft FRAR's families, as well as subject entities). Examining current practices from the perspective of this work on relationships shows great inadequacies in the identification, recording, and utility of relationships.

 

FRBR does an admirable job of providing one way to analyze the bibliographic universe, though as has been noted by others, it doesn't extend well to museum or archival collections. Although FRBR covers attributes of bibliographic entities, it does not model the metadata itself (that is, none of the entities represents metadata per se).

 

DCAM

 

The Dublin Core Abstract Model from the Dublin Core Metadata Initiative (DCMI), on the other hand, takes the next logical step, and models metadata. Its purpose is to "to gain a better understanding of the kinds of descriptions that we are trying to encode and facilitates the development of better mappings and translations between different syntaxes."

 

The FRBR model and the Dublin Core Abstract Model are not contradictory; in fact, they are complementary. FRBR provides a start at defining properties for RDA and allows the description of resources using specific relationships that can be assigned at the proper level as well as aggregated for better expression to the user. The DCAM helps us to envision the FRBR entities as a package, allowing the discussion about issues like identity and linking to be posed and discussed in a more useful manner.

 

Domain Models and Metadata Schemas

 

Even as we validate the use of FRBR as a model, we take issue with its embedded attributes. One of the things the DCAM and the Dublin Core experience generally tells us is that we need to develop our attributes/properties/elements separately from the model as well as from the values used. Separating elements and their definitions from guidance on determining their values (controlled vocabularies, transcription, etc.) is crucial in order to achieve interoperability and extensibility.

 

As a first step, the FRBR attributes must be carefully generalized. For example, instead of defining separate elements (including their names, definitions, examples, etc.) for title of the work, title of the expression, and title of the manifestation, there should be one title element reused at multiple levels. The declaration of these elements should include clear specification of where in the FRBR Group I they may be used. This increased generalization promotes interoperability, minimizes a tendency toward complexity, and eases machine manipulation and extensibility. It also requires more rigorous consideration of when attributes at the various levels are really the same thing or not, and can point out inconsistencies that can be rectified. Along with the development of the generalized elements, there should be rules for extension or refinement of those elements, to ensure that appropriate extensions can be made and managed.

 

Crucial to the proper development of a metadata schema is a clear notion of requirements for technical expression of the attributes, and a plan for maintenance and growth. We have learned much in the library community about the importance of community consensus and how to maintain important standards over time. MARBI is a good example of doing it correctly, and in fact the Dublin Core Usage Board process is based loosely on MARBI.

 

Guidance for Application

 

It is critically important that we develop good usage guidance based first on the Domain Model attributes in their most generalized form. We must provide this usage guidance in a manner that allows communities of practice to use the general guidance as they extend the basic structure for their own purposes. Traditional library cataloging is just such a community of practice, and should extend the schema and guidance to fit their needs, without the necessity of bringing their special library colleagues along with them. If the general elements, and the guidance attached specifically to them, can be approached as a extensible set, other communities will be encouraged to incorporate them specifically in their metadata and to extend in ways that provide a sound basis for interoperable use and re-use. In this scenario, mapping between library metadata schemas and others, as well as the mix/match capabilities of application profiles, can be made easier. This approach will tend to minimize data loss when information is crosswalked, and improve the ability of machines to act upon the data regardless of its origin.

 

As part of this development of extended guidance material along specialist lines, we need to recognize that different communities will apply FRBR Group I boundaries differently. Much of the discussion about how decisions will be made about works, expressions and manifestations indicates clearly that specialized communities will tend to make different decisions about where these boundaries lie. This has been seen as a problem, and an impediment to the integration of FRBR principles into actual practice. Part of the rationale for separating traditional library specific instruction from the general RDA, and enabling specific communities to extend from that general base, is that the assumptions and instructions for where these boundaries lie can be made explicit by community, and librarians can get out of the trap of trying to herd everyone else into the same decisions. This will make it easier for the communities as a whole to use each other's work--when differences are not exceptions but can be explicitly expressed as policies and appropriately supported with more detailed extensions to the general framework, everyone enjoys easier and more cost effective machine manipulation of data. So long as the determination of what is a work can be ascribed to the community that made the decision, other communities can predict and cope with the variations.

 

Encoding

 

It seems unlikely that MARC21 can be sufficiently remodeled to serve as an encoding for a modern metadata schema, but certainly some of the accumulated wisdom and experience embedded in the MARC21 documentation can be repurposed. One issue is that insofar as it supplies definitions, labels and relationships not necessarily explicit in AACR2, MARC21 itself represents a combination of functions that requires significant attention, and perhaps deconstruction, to prise out what should be included in the metadata schema and what remain as encoding.

 

It should also be recognized that MARC21 encodes more than bibliographic information, and the formats for classification, authorities and holdings might well be more appropriate for future use, given that they operate where competing data structures are sparse. Where they tend to be problematic is in the area of distinctions at the statement level, where specification of language of statement, source, and community of origin may well be necessary.

 

Encoding for the future must support statement level identification and attribution. Although to a certain extent, this is a 'packaging issue,' it seems important to assert it as a guiding principle, as it supports the notion that the way records will be built in future will be much more iterative, and catalogers are just as likely to start with a re-used description than one created newly for purpose. These catalog records of the future are likely to be aggregations of the work of many catalogers--somewhat like CONSER records are now--and the source and age of particular statements will be critical as we develop applications to make 'decisions' about what statements they will display. Central to this assumption is that, in the shared environment of the future, information may be added, but not subtracted--just ignored if not needed or desired in a particular context.

 

An Example

 

 

The figure above illustrates some possibilities for a description set based on DCAM that also includes some of the FRBR entities and shows how they would relate. On the left side are the four Group I entities, with a small assortment of generic properties. In the cases where the value of the properties is contained in another description, the relationship between them is conveyed with an identifier, and the identified Group 2 or 3 description is included in full with the description set. Thus, an application using this description set could presumably pick and chose among the available display values, for the one that suits its goals best. For instance, in the description of the author, there are two identified possibilities for display text for that particular person, one using direct order, and the other surname first.

 

Note that the linking techniques are the same regardless of what kind of description, whether author, publisher or subject is related to a particular Group 1 entity. There is both a title in the Work description and another in the Expression--the differences between them and their different functions are conveyed not in the property name, but in where it appears, allowing an application to determine how to display either or both. Grouping of expressions and manifestations can be supported using simple linking and naming strategies, without unnecessary complexity.

 

Using only the descriptions in this simple example, the following display could be supported:

 

 

Note that the link to an English version is implied by the presence of another description set (not illustrated here) with the same work description and an expression description in English.

 

Additional resources

 

What is a Dublin Core Application Profile, really?, by Pete Johnston.

Towards an Interoperability Framework for Metadata Standards, Nilsson et. al.

 

Comments (15)

Anonymous said

at 10:08 am on Mar 12, 2007

This is the place to comment on the Framework. My main comment is that this is NOT SET IN STONE. None of us have a clear idea of where we'll end up. So consider this a stab at clarity and feel free to question everything.

Anonymous said

at 3:06 pm on Mar 12, 2007

In a roundabout way (through Lorcan Dempsey) I ran into this from Gunter Waibel of RLG/OCLC:
http://hangingtogether.org/?p=152
It's another possible view of a framework.

Anonymous said

at 11:21 pm on Mar 12, 2007

It seems to me the mountain has laboured and brought forth a mouse. The sample descriptive set would hardly serve in a children's collection. Who translated it? Where is Ego Press? Is this item illustrated, and does it have a portrait? Does it have a bibliography and index? Why is its genre called a subject? It is, one assumes, an
autobiography, not a book about autobiographies as the descriptive set says. Where is the ISBN for ordering if a replacement copy is needed? Even a publisher's catalogue entry would tell me more. An ISBD display without the labels would tell me more in less space.

I realize it is a simple example intended to demonstrate a thesis. But certainly MARC21, which the author's declare can not be made to work, provides superior language neutral coding. Even with fixed fields junked, MARC21 would be a better tool.


Anonymous said

at 8:45 am on Mar 13, 2007

I can't parse this sentence:
"FRBR, and work by Barbara Tillett, Richard Smiraglia, and others has contributed to an increasingly formalized notion of relationships among bibliographic resources, and between bibliographic resources and associated entities (for instance, FRBR's group 2 entities--persons, corporate bodies--and draft FRBR's families, as well as subject entities)."

The parenthetical phrase is throwing me off. Can you explain what you mean? -Jodi

Anonymous said

at 8:46 am on Mar 13, 2007

This makes sense but would benefit from an example:
"One of the things the DCAM and the Dublin Core experience generally tells us is that we need to develop our attributes/properties/elements separately from the model as well as from the values used. Separating elements and their definitions from guidance on determining their values (controlled vocabularies, transcription, etc.) is crucial in order to achieve interoperability and extensibility."
-Jodi

Anonymous said

at 8:49 am on Mar 13, 2007

"Traditional library cataloging is just such a community of practice, and should extend the schema and guidance to fit their needs, without the necessity of bringing their special library colleagues along with them."

I would emphasize that it is just ONE community of practice.
What's important, I think, is to to create standards that are sufficiently generic to be universal, and which can be adapted, extended, and constrained to support individual communities' needs. To me, that's what works well about DC. -Jodi

Anonymous said

at 9:04 am on Mar 13, 2007

Certainly the item in the sample set would be of no use to a patron seeking works about autobiographies, the stated "Subject". But more importantly, how is this work to be located by the patron seeking information *about* the
person tagged "Author"? In the early days of the dictionary card catalogue, subject cards for autobiography were omitted because they duplicated the main entry card. We soon got over that as catalogues grew in size. In the early days of MARC, an indicator was used in 100 (2nd indicator 1) to say that the main entry is also subject. We soon got over that
as well, and now code a 600 for the author's name. In this descriptive set, there is *no* way for the item to be searched and found by the patron wanting information about that person. Why do we seem determined to repeat our earlier mistakes? Am I the only one around old enough to *remember* our earlier mistakes? This reinvented metadata wheel is missing some important spokes.

Anonymous said

at 11:17 am on Mar 13, 2007

I guess we need to make clear that the examples here are very schematic and are not intended to look like full records or even like actual cataloging. I'll try to add some text that explains what the examples are illustrating, which is primarily about how fields relate to their identifications.

Anonymous said

at 10:01 am on Mar 14, 2007

Since the links between Group 1 and Group 2 are shown explicitly in the example, I'd suggest adding explict links between the Group 1 levels, assuming that's what is intended. But then I get into FRBR problems. Suppose "Mi vida" is actually an aggregated text with a new introduction and a translator's commentary. Would that aggregate be a separate Work? If so, would the "Mi vida" expression need to link to (at least) two Work level records, one for the aggregate and one for the core text? Also, for the original "My life," would Language be an Expression level property or a Work level property? Having the Group 1 set for "My life" as well as "Mi vida" in the example and showing links between them would make the design decisions in the schema clearer.

Anonymous said

at 11:12 am on Mar 15, 2007

I realize this is a point of absolutely no controversy in the FRBR community, but I have never been happy with the title attribute in association with the abstract entities, work and expression. It seems contrary to the spirit of an abstract entity, not to mention creating practical problems (e.g., for serials). There obviously could be many titles associated a work in its manifestations. Libraries may want to select one over another for the "work," or display, title. Works and expressions only need identifier attributes: for the work, author and subject. With identifiers also associated with manifestations and items a virtual authority "record" is created that libraries could use to select the display value for that work, just as they will/should be able to for the author (the internationalization philosophy guiding VIAF, to the extent I understand their objectives). Why would we want to mandate a single term to identify a work in all possible displays? Especially since it's reality that those terms are in some cases subjectively designated.

The desire to assign title to the abstract entities seems to reflect two things, the first being the tendency to see each FRBR entity as needing to be able to stand alone. The document itself reflects this in duplicating so many attributes. There is also the understandable desire for a uniform title, but this approach perpetuates the current problem where bibliographic records repeat textual authority data rather than simply linking to authority records. Flexibility in system development, as well as internationalization, should push us in the direction of embracing the true potential of abstraction at the work and expression levels.

Anonymous said

at 11:29 am on Mar 15, 2007

Kristin - thanks for the thoughtful post. You said: "The document itself reflects this in duplicating so many attributes." I've almost finished a table with all of the FRBR attributes, and surprisingly I have found very few that appear in more than one level. Title is the main one. There are dates at various levels, but they represent different things. I've been looking at the attributes to try to understand better the meaning of the levels. The Work level seems to be a general identity. The Item level is very concrete (of course). I'm still puzzling over the two in the middle, however. I'll post the table somewhere (will link from this wiki if I can't post such a large table), and will probably blog this when my thinking is clearer. Another interesting point (which is known to FRBR experts but I had to discover it) is that subjects only relate to the Work Entity. At least, that's the way it is in the FRBR document's diagram. I'm beginning to think that the only Entity that could stand alone is the Work. No, I don't know what this means.

Anonymous said

at 4:00 am on Mar 17, 2007

Philip Davis comment.
It is helpful to have set down the sequence of stages in bibliographic control as seen by the Dublin Core community,namely Model, Schema, Guidance, Encoding.To mirror this, I should tend to express the sequence as Statement of International Cataloguing Principles,International Standard Bibliographic Descriptions,Resource Description and Access, MARC.

Anonymous said

at 3:00 pm on May 15, 2007

Both the "manifestation" and "expression" entities together basically make up the traditional/colloquial idea of 'edition'. If two different editions have exactly the same (or same enough for contextual purposes) text or content, then they are two manifestations of the same expression. For instance, the 1972 edition and a 1981 edition or whatever. Might have different ISBNs even. But contain the same text, then different manifestations of the same expression. If, on the other hand, a new edition is seriously revised, then it's also a new expression.

You can come up with all sorts of grey areas easily, and the FRBR model does not try to precisely instruct you how to decide these grey areas. That's the role of 'Guidance', not 'model'. The FRBR Model decides it's useful/neccesary to have this distinction, it's up to implementing communities to specify how to make it.

Whether there's a need for expression AND manifestation, instead of just one 'edition' entity is one of the more controversial parts of the FRBR ontology. I think they chose right though.

I also like thinking of this with Elaine Svenonius' set theory approach, so from the bottom up:
An item, is an actual individual concrete book in your hand.

A manifestation is the set of all items that are identical (or close enough) in _physical form_ as well as content.

An expression is the set of all manifestations that are identical in _textual_ or _information_ content. (or close enough for our purpopes; an archeologist would consider the coffee stain on the back to be distinguishing information content; we do not).

And a work is the set of all expressions that well, consist of the same intellectual work. This is definitely a cultural concept, but it's one we have and find useful. We consider the audio book version of a book to be the same _book_, just a different version. That's work.

Anonymous said

at 6:57 pm on Jul 7, 2007

I think our "model" section is doing two different things. I agree that FRBR acts as a "domain model". But we also need a formal model of our data elements. I don't know if this is one thing or two different things.

Susan Roby Berdinka said

at 8:02 am on Nov 11, 2009

Excellent article - my editing was only to remove embedded spam links.

You don't have permission to comment on this page.