Metadata & Access

GSLIS 609 (McGill) - Winter 2005

 

paper by: Tomasz Neugebauer

 

Resource Description Framework (RDF)

http://www.w3.org/RDF

 

Table of Contents

Organization/creator in charge of managing and administering. 5

Users/target audience. 5

Purpose and Content 5

Domain. 5

Structure of metadata elements. 5

Special features/popularity. 5

Similar in area. 5

Overall evaluation. 5

 

 

 


 Organization/creator in charge of managing and administering

 

From the RDF Primer we learn that RDF is a World Wide Web Consortium (W3C) recommendation endorsed by the director (Tim Berners-Lee), reviewed by its members and “and other interested parties”. [1]  The director of W3C is credited with the invention of the World Wide Web and establishing the W3C,

“an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards. W3C's mission is:  To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web.” [2]

 

 

Users/target audience

 

The audience is limited only by the fact that RDF is a W3C Recommendation and so applicable to the World Wide Web, and of special interest to the web development community.  RDF provides “a lightweight ontology system to support the exchange of knowledge on the Web”[3], it is an integrating technology for the exchange and processing of metadata in general.  RDF statements are expressed in Extensible Markup Language (XML), and like XML, RDF is context independent: “RDF does not stipulate semantics for each resource description community, but rather provides the ability for these communities to define metadata elements as needed.” [4]  The target audience includes all of the user communities interested in development and exchange of metadata on the web.

 


Purpose and Content 

 

RDF is a common framework (parsers and processing tools) intended to facilitate the interoperability of metadata between applications. [5]  RDF is a language for expressing informative statements about resources in the World Wide Web (e.g.: web pages, online catalogues and databases.)  It is intended for encoding metadata “such as the title, author, and modification date of a Web page, copyright and licensing information about a Web document, or the availability schedule for some shared resource.” [6]  The content includes all of the resources that can be identified on the web with a Uniform Resource Identifier (URI), “short strings that identify resources in the web: documents, images, downloadable files, services, electronic mailboxes, and other resources.” [7]  A Uniform Resource Locator (URL) is a kind of URI that identifies a web resource by specifying a network location where the resource can be accessed.  However, “URIs are not limited to identifying things that have network locations, or use other computer access mechanisms” [8] and they can be used to identify human beings, corporate bodies, and abstract concepts such as authorship.

RDF is a web technology, but since Web pages describe and identify resources that are not necessarily online (for example: the web pages of a library describe and identify physical books and periodicals, web pages of an archive describe special collections, a human-resource database describes employees, etc.), the purpose of the metadata exchange and processing of RDF can extend beyond the Web. 

 

Domain

 

As a W3C framework recommendation RDF remains within the web development domain.  RDF is a language for expressing metadata on the World Wide Web that is attracting research interest from the computer science and other information science-related fields such as knowledge management[9].

 

Structure of metadata elements

 

All RDF statements are meant to be processed by computers, and the underlying structure of each statement is composed of 3 parts: subject, predicate (i.e.: property) and object.  RDF consists of statements expressed in XML about the properties of resources identified with URIs.  The properties should also be identified with URIs, so that parsing applications can process the information accordingly. 

For example, the following RDF statement about the resource Photomedia.ca states that Tomasz Neugebauer is its ‘creator’ in the sense identified by the URI (http://purl.org/dc/elements/1.1/).   We begin with the XML/RDF declaration signaling to parsing applications that the RDF model is about to follow.   Then we identify the subject resource (by setting the rdf:about property to the object http://www.photomedia.ca), and setting the dc:creator property (from the Dublin Core vocabulary at URI http://purl.org/dc/elements/1.1/) to the object “Tomasz Neugebauer”:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.photomedia.ca/">
         <dc:creator>Tomasz Neugebauer</dc:creator>
</rdf:Description>
</rdf:RDF>

 

This next example makes an RDF/XML statements about the resource Tomasz Neugebauer (identified as URI http://tom.biodome.org), using the predicate vocabulary of contact namespace (which defines the meaning of properties fullName, mailbox, etc) at URI (http://www.w3.org/2000/10/swap/pim/contact#)[10] :

 

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#">
  <contact:Person rdf:about="http://tom.biodome.org">
    <contact:fullName>Tomasz Neugebauer</contact:fullName>
    <contact:mailbox rdf:resource="mailto:tomasz@biodome.org"/>
  </contact:Person>
</rdf:RDF>
        

The examples above illustrate that RDF itself defines few elements such as: rdf:Description, rdf:about, and rdf:resource.  The utility of this standard comes from the fact that it allows for the importing of other XML Namespaces such as Dublin Core namespace (http://purl.org/dc/elements/1.1/).

 

Special features/popularity

 

Bray lists the design principles of RDF: independence, interchange, scalability, the use of URI to identify and treat subject, property and object as resources.[11]  Independence comes in the form of RDF acceptance of independently invented predicate vocabularies.  Interchange is a consequence of the use of XML as the language of expression for RDF statements and predicate vocabularies.  Scalability is achieved with the use of a semantic model (subject, predicate, object) that is computationally efficient and highly expressive.  Bray argues that RDF gives XML the additional scalability it needs to become an enabling technology for the web becoming more like an organized library.

 

RDF is open to accept all kinds of property vocabularies (XML Namespaces), and as such offers a platform for integrating large collections of metadata on the web through web applications into searchable environments.  Kennedy concludes that “although some feel that it still shows promise, RDF has yet to attract widespread adoption in the metadata community.[12]  Nevertheless, the projects and applications section of the W3C RDF page[13] ( links to an impressive list of tools and applications based on RDF, including the Dublin Core Metadata Initiative’s integration of its elements as an RDF namespace, the Open Directory Project offering RDF dumps of its categories and resources (http://rdf.dmoz.org/), the MIT Libraries Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) project (http://simile.mit.edu/) enhancing DSPACE (http://www.dspace.org/) support for arbitrary schemata and metadata through the use of RDF, and many more including RDF developer resources (e.g.: parsers, Application Programming Interfaces, etc.) in C, C#, Python, Java, Perl, Lisp and more.

 

Similar in area

 

RDF Site Summary 1.0 is a “a lightweight multipurpose extensible metadata description and syndication format. […] conforming to the W3C's RDF Specification.” [14]  Kennedy notes that “you will sometimes see an RSS 1.0 feed identified as RDF.” [15]  RSS has become a popular metadata exchange format on the web, and it is actually an RDF/XML implementation.

RDF provides a semantic platform (model) for the expression and interchange of metadata statements using various vocabularies.  RDF is open to accept many different predicate sets (referenced by various URIs), however, it does not provide any property (i.e. predicate) vocabularies of its own.  The OWL Web Ontology Language provides “greater machine interpretability of Web content [than RDF/XML] by providing additional vocabulary along with a formal semantics.” [16]  RDF, along with OWL, are the foundation standards for semantic web activity of the W3C, providing “a framework for asset management, enterprise integration and the sharing and reuse of data on the Web.” [17]

 

Overall evaluation

 

In my opinion, the RDF standard is a necessary step in the evolution of the World Wide Web.  The number of resources available online is growing fast, and search engines are mainly using natural language searching to locate resources.  Consequently, the amount of computer processing of information remains low, with much human-level processing necessary to structure and derive meaning from online sources.  The RDF standard aims to tap into that unused potential for computer-processing of metadata by providing a semantic model of expression that result in interoperable and interchangeable XML-based information that can be used by web applications to structure responses to human queries.

 

Let me illustrate with an example: there are many online venues available for photographers to display their work online such as NikonNet (http://www.nikonnet.com), Fotolog.net (http://www.fotolog.net/), Photo.net (http://www.photo.net/), Flickr (http://www.flickr.com/) and many others.  A single photographer is likely to have images on many different sites.  Without RDF, the only options for searching is by visiting all of these different sites, performing searches for the photographer, and compiling a list.  RDF is an enabling technology for a meta-search application that will visit all of these various sites, extracting relevant information (image titles, locations, criticisms, etc.) to that particular photographer, processing and organizing the output in a meaningful way for the searcher.  This processing could include the elimination of duplicate images, the collation of critiques of the same image from various sites, as well as links to resources and people closely related to the photographer, and more.  All of this is made possible with the use of a semantic model of representing information along with its meaning using RDF.  Plain HTML (with elements such as paragraph, table, font) cannot deliver these kinds of meta-search applications because its property set is limited to display the information (photographer and image name, etc.) for human processing, whereas RDF offers a way to turn the statements inside paragraphs and tables of images into structures that contain semantic information that can be further processed algorithmically.  RDF the statements are predictably expressed as subject-predicate-object triples where each of the three is clearly defined by an XML Namespace.  Consequently, computation can be used to process the statements in order to derive, summarize and interchange data from various sources.

 

RDF has been criticized for being overly abstract for the majority of web developers[18] but in my opinion this is only because the need for the kind of applications that RDF enables is only now becoming apparent.  As the perils of simple keyword searching through an ever-increasing ocean of information become apparent, so will the interest in enabling technologies for metadata interoperability.  Only ten years ago, knowing HTML was considered a technological challenge reserved for the tech-savvy, and today students learn to use complex WYSIWYG tools to produce HTML in lower levels of elementary school.  Similarly, RDF seems like an abstract challenge today reserved for the few, but with time and through further development of RDF-based tools and technologies it will become a standard for expressing metadata online.

 



[1] <http://www.w3.org/TR/rdf-primer/>

[2] About the World Wide Web Consortium (W3C)  <http://www.w3.org/Consortium/>

[3] <http://www.w3.org/RDF/>

[4] <http://www.dlib.org/dlib/may98/miller/05miller.html>

[5] <http://www.w3.org/TR/rdf-primer/>

[6] <http://www.w3.org/TR/rdf-primer/>

[7] < http://www.w3.org/Addressing/>

[8] < http://www.w3.org/TR/rdf-primer/>

[9] for an interesting example of the use of RDF in knowledge management, see for example:

Davies, J.  Fensel, D. van Harmelen F.   Towards the Semantic Web : Ontology Driven Knowledge Management.   Wiley Interscience, 2002.

[10] adapted from the example at RDF Primer < http://www.w3.org/TR/rdf-primer/>

[11] < http://www.xml.com/pub/a/98/06/rdf.html>

[12] Shirl KennedyComputers in LibrariesWestport: Feb 2004.Vol.24, Iss. 2;  pg. 27, 1 pgs

[13] < http://www.w3.org/RDF/>

[14] <http://web.resource.org/rss/1.0/spec>

[15] Shirl KennedyComputers in LibrariesWestport: Feb 2004.Vol.24, Iss. 2;  pg. 27, 1 pgs

[16] < http://www.w3.org/TR/owl-features/>

[17] <http://www.w3.org/2001/sw/WebOnt/>

[18] < http://www.itworld.com/nl/ebiz_ent/03182003/>