XFML Core - eXchangeable Faceted Metadata Language

Table of contents
1. Status of this document
2. Introduction to XFML
3. Conceptual model
4. XFML format specification
- 4.1 Namespace
- 4.2 <xfml>
5. Processing instructions
6. License

1. Status of this document

This is the final and frozen version of this specification.

Please feel free to base your implementations on this spec - it is frozen and will not change. Possible future versions of XFML (see http://xfml.org/spec/) will remain backwards compatible.

The permanent URL for XFML Core is http://purl.oclc.org/NET/xfml/Core/, which redirects to this page at http://www.xfml.org/spec/1.0.html

1.1 Authors

XFML Core was developed by Peter Van Dijck (http://petervandijck.net), with crucial support and feedback from Matt Mower, Travis Wilson, Eric Scheid, Danny Ayers, David Gammel, Leonard Will, Louis Argerich and many more that I have forgotten to mention here.

1.2 Translations

The English version of this document is the only normative one. However, translations are encouraged and will be linked here.

1.3 Revisions

13 December, 2002: Some inconsistencies in the spec were pointed out by Mark Pilgrim. The spec hasn't changed - your implementations need no changes.

Changed the example of the facet element because id's cannot start with a number (this follows from the DTD and wasn't explained explicitly in this spec).
Added "with the exception that the year may be expressed with two characters or four characters (four preferred)" to the date format for the update elements explanation. The example was always correct, I was sloppy with the description.

14 February, 2003. Some additions and clarifications. Again, the spec hasn't been changed, just clarified - your implementations should remain compatible.

Added numbering for easy reference when HTML is not available, suggestion of Jeremy Shantz.
Added namespace, suggestion of Robin Berjon.
Clarified how to resolve relative URI's at 5.7 Resolving relative URI's (this and the following thanks to suggestions of Lars Marius Garshol)
Changed the recommendation for the MIME type and extension to be used.
Added a note to the facetid in the topic element.
Added a note (5.2.1) on different behaviour for processors than expected if you are used to XTM.
Added a note on the semantics of the facet-topic relationship in the explanation of the topic concept.

2. Introduction to XFML

This specification defines XFML version 1.0 aka "XFML Core".

XFML lets you exchange hierarchical faceted metadata. It also lets you indicate topics in different published XFML documents are equal, thus allowing you to reuse indexing efforts.

XFML borrows many ideas from Topicmaps, a format you should check out if you like the ideas behind XFML but are frustrated with its limitations (see http://topicmaps.org).

The XFML specification (this document), that explains what xfml is, consists of three things:

a bunch of concepts, i.e..a conceptual model.
an XML format for expressing these concepts
a set of processing instructions that explain how applications should work with XFML data.

3. Conceptual model

3.1 Overview

XFML is a model to express topics, organised in hierarchies or trees within mutually exclusive containers called facets. It also expresses indexing efforts: metadata you have assigned to pages. It lets you publish this information in an open, XML based format. Finally, XFML lets you build connections between different XFML maps, by indicating that a topic in one map is equal to a topic in another map: we call this connecting topics, or that a topic is described on a certain resource (a webpage usually), we call this published subject indicators.

3.2 Why is XFML different?

The Core value proposition of XFML Core is that it lets you share and reuse indexing efforts (see indexing concept) by publishing your metadata, without needing any kind of central or distributed definition of your topics. Quality indexing is hard - reusing indexing efforts will make metadata more useful. This is important: with XFML there is no centrally defined set of metadata. Each author defines their own facets and topics. XFML can get away with not having a central store of metadata because it lets authors connect individual topics between two specific maps - see connection topics concept.

XFML Core also provides a simple format to share and reuse faceted metadata hierarchies (the topics and facets). If you export XFML, you can import it in a variety of faceted metadata browsing applications that support XFML like Facetmap.

XFML is a specialised format, as opposed to XTM or RDF, which are generic metadata formats. XFML will not solve all your metadata needs.

XFML, by design, is simple to write code for.

3.3 XFML map concept

An XFML map consists of data that represents xfml concepts as described in the conceptual model you are reading now. The data is often kept in a database and published as an XFML document.

3.4 Topic concept

Anything you can possible imagine, whether it exists or not, can be a topic.

The creator of the map defines topics as he sees fit. For example: 'love" can be a topic. Or "5 o'clock in the afternoon". Or "Shakespeare". The topic concept is the same as it is in topicmaps.

Note the difference between a topic and a term, as used in thesauri for example. A topic is an abstract entity, a term is a specific (set of) words. So the topic "accessibility" could be described by the term "accessibility", but also by the terms "universal access" or "ease of access". XFML doesn't deal with terms, it deals with topics.

Topics are organised in hierarchies (trees), from general to specific. A topic can have one other topic as parent (no multiple parents allowed). Each topic belongs to one facet. There can't be two representations of the same topic in a map. Two topics are equal if at least one of their psi's are equal, or if at least one of their connections are equal.

A topic always belongs to one and only one facet. The semantics of the facet-topic relationships are as follows: topics are not instances of facets, they are members of a facet. For example, the topic USA is not an instance of the facet Geography, it is a member.

3.5 Facet concept

Facets are mutually exclusive containers that contain hierarchies of topics.

Mutually exclusive means that a certain topic can only possibly belong to one facet. "Things to do" and "Places to go" are good facets, because a topic can never be both a thing to do and a place to go to. "People" and "Colours" are two other good facets. "Cities" and "Places to visit" are bad facets if used in the same map because "Brussels" (a potential topic) could belong to both. Note that the "mutually exclusive" requirement is not something that can be enforced by software.

If you want to see what navigation using facets can look like, visit http://facetmap.com

To avoid confusion, note that the term "facets" is used with somewhat different meanings by different people. So is the term "taxonomy", which is why we are avoiding it in this spec. The term "facet" in XFML is used very much like the library-science definition of the term. An excellent overview of the term "faceted classification" can be found at http://www.kmconnection.com/DOC100100.htm

In programming terms: only the top level of a tree is called a "facet" in XFML - all the children are called "topics" (because they don't have to be mutually exclusive). A page cannot be indexed with the "facet", only with the topics. A map can contain multiple trees of topics (with a different facet as the root for each) .

3.6 Indexing concept

Indexing means assigning topics to pages. Or pages to topics - same thing. When you index a webpage you say: "This webpage is about the following topics in my map: [list the topics here]". Since no write-access is needed to a webpage to index it in your map, you can index anything on the web.

3.7 Occurrence concept

An occurrence of a topic on a page is what happens when you index that page with that topic. The topic "occurs" on that page.

3.8 Occurrencestrength concept

Occurrencestrength indicates how strong you believe in your indexing of that particular topic to that particular page.

Occurrencestrenght is a whole number from 1 to infinity. An occurrencestrength of 1 means you have complete confidence in your indexing: you have indexed it yourself, manually. Higher numbers mean less confidence. If you reuse indexing work of others, you will generally add one (+1) to the occurrencestrength: your confidence in the indexing decreases (since you didn't do it yourself). In this way, occurrencestrenght also serves as an indicator of degrees of separation to the source of indexing.

3.9 Connecting topics concept

If two topics are connected in a map, it means the map's author had indicated he believes these topics are the same. They may have different names, but they are about exactly the same concept. You can only connect topics in separate maps, not within the same map.

Once you have connected topics, you can reuse indexing efforts of other authors, since you now know what their topics mean relative to your topics. There is no connection scope: two topics are either connected (are equal in all ways) or not. Connections are one-way: two topics in two different maps can be connected in one map (the author of the map indicated they were equal), but unconnected in another map (the author of that map didn't connect them).

Two topics in the same map cannot be connected to the same topic in another map, since that would mean they were equal.

Connecting topics is a very important concept in XFML - see why is XFML different.

3.10 Map network concept

A map's "network" is the collection of maps that have topics that are connected to topics in this map. So if you connect topicA in mapA to topicX in mapX, then mapX is in the network of mapA. Note that mapA isn't in the network of mapX until authorX connects topicX with topicA in mapX.

3.11 Published subject indicator concept

The psi or published subject indicator is a link to a (preferrably human readable) resource that defines this topic. It is pretty much the same as a psi in topicmaps. Only one topic in a map can point to one psi (if not the two topics would be identical). A topic can have multiple psi's. The idea is that, if two topics in two different maps have the same psi, then an application will know for sure they are identical.

3.12 XFML document concept

An XFML document is a file in XFML format containing XFML data. An XFML document is a published map.

3.13 Published map concept

A published map is a document (usually in the XML format specified in this spec, in which case it is an XFML document, but it could be published in another format) that has been made available to the public, usually on the web. An XFML map can be published, but doesn't have to be; you could also just use it internally and never publish it. So in short: an XFML map (i.e. the data) can be published as an XFML document.

4. XFML format specification

An XFML document is a valid, well formed XML document, and conforms to the XFML DTD found here: http://xfml.org/spec/xfml.dtd and to this spec.

Here is an example XFML file: http://xfml.org/spec/example.xml

It is recommended to add a comment indicating the location of this spec to your XFML file, as follows:

4.1 Namespace

The XML namespace for XFML Core is http://purl.oclc.org/NET/xfml/core/ (which points to this spec)

4.2 <xfml>

Parent element of an XFML document.

Example:

<xfml version="1.0" url="http://domain.com/xfml/map1.xml" language="en-us">

Child elements: optional <mapInfo>, and multiple optional <facet>, <topic> and <page> elements.

Attributes:

4.2.1 version

Required attribute, for XFML Core version="1.0"

4.2.2 url

Required attribute. Contains the url where this map is kept. The reason it is required is that we then have an unambiguous pointer to topics in this map.

4.2.3 language

Optional attribute indicating the language of the entire map. Takes a two or four letter language code. Two-letter primary codes are reserved for ISO639 language abbreviations. Two-letter codes include fr (French), de (German), it (Italian), nl (Dutch), el (Greek), es (Spanish), pt (Portuguese), ar (Arabic), he (Hebrew), ru (Russian), zh (Chinese), ja (Japanese), hi (Hindi), ur (Urdu), and sa (Sanskrit). Four letter codes contain a two-letter subcode (like this: "en-us" or "en-uk") where the subcode is understood to be a ISO3166 country code.

Child elements:

4.2.4 <mapInfo>

Describes (all optional) information about this map.

Example:

<mapInfo lastUpdate="" nextUpdate="">

Attributes:

4.2.4.1 lastUpdate

Date of the last update of this map. A new lastUpdate date should only be added when a structural change is made: a change in topics, facets or occurrences. Changes in the mapInfo element or changes in comments in the map should not result in a new lastUpdate date. This and the nextUpdate attribute follow the RFC822 convention, with the exception that the year may be expressed with two characters or four characters (four preferred), like this: "Sat, 07 Sep 2002 0:00:01 GMT".

4.2.4.2 nextUpdate

Date of the next expected update of this map. Applications downloading the map should reload the map on or after this date to check for updates.

Child elements:

4.2.4.3 <managingEditor>

Zero or one in the mapInfo element. The managing editor is the person responsible for the content of the map. Contains a name element, an optional email element and an optional url element.

4.2.4.4 <editor>

Zero or more in the mapInfo element. Editors are editors of the map. There should be a managingEditor if you have any editors. Each contains a name element, an optional email element and an optional url element.

4.2.4.5 <publisher>

Zero or one in the mapInfo element. The publisher is the owner of the map (so not nessecarily the website where it is published). Contains a name element, an optional email element and an optional url element.

4.2.4.6 <webMaster>

Zero or one in the mapInfo element. The webmaster is the person to contact in case of technical problems or questions. Contains a name element, an optional email element and an optional url element.

4.2.4.7 <license>

Zero or one in the mapInfo element. The license describes how people can use the map. Contains a text element, describing the license, an optional name element, containing a title for the license, an optional email element and an optional url element linking to the complete license.

4.2.4.7.1 <text>

Contains text explaining the license. No HTML.

4.2.4.8 <generator>

Zero or one in the mapInfo element. The generator is the software that generated the map. Contains a name element, an optional email element and an optional url element.

4.2.4.8.1 <name>

Contains a string that is the name - for use with the managingEditor, editor, publisher, webMaster, license and generator elements.

4.2.4.8.2 <email>

Contains a string that is the email - for use with the managingEditor, editor, publisher, webMaster, license and generator elements.

4.2.4.8.3 <url>

Contains a string that is the url - for use with the managingEditor, editor, publisher, webMaster, license and generator elements.

4.2.5 <facet>

A facet is a container that contains topic trees. The value is the name of the facet.

Example: (note id's are not allowed to start with a number)

<facet id="a2">Things to do</facet>

Attributes:

4.2.5.1 id

Required attribute. id is unique within topics and facets.

4.2.6 <topic>

See the topic concept.

Example:

<topic id="red_wine" facetid="wine_colours" parentTopicid="wine"><name>Red wine</name> [...]

Attributes:

4.2.6.1 id

Required attribute. id is unique within topics and facets.

4.2.6.2 facetid

Required attribute. The id of a facet in this map.

Note: the facetid attribute on <topic> is redundant when a parentTopicId is specified. It was included for exporting in environments with limited scripting capabilities (like a templating system). The parent topic must belong to the same facet as the topic.

4.2.6.3 parentTopicid

Optional. The id of a topic defined somewhere in this map. Only topics that have the same facetid can be parents of this topic. A topic can have only one parent.

Child elements:

4.2.6.4 <name>

Required, value is the name of this topic.

4.2.6.5 <connect>

Zero or more. Lets you indicate a topic is equal to another topic in another map. Value is a URL to another XFML map, plus the character # plus the id of a topic in that map. See connecting topics concept. For example: <connect>http://othersite.com/othermap.xml#othertopic</connect>

4.2.6.6 <psi>

Zero or more. Published subject indicators let you point this topic (using a URI) to a published subject like a webpage or an anchor link on a webpage that clearly describes the topic. See published subject indicator concept.

4.2.6.7 <description>

Optional. Value is text describing the topic.

4.2.7 <page>

Refers to a webpage that you have indexed with topics in this map.

Example:

<page url="http://domain.com/somepage.html"><title>Title of page</title><occurrence topicid="some_topicid_inthismap" strength="1" /><occurrence topicid="some_other_topicid_inthismap" strength="1" /></page>

Attributes:

4.2.7.1 url

Required. The url of the page. Is unique in map: only one page element can have the same url in a map.

Child elements:

4.2.7.2 <title>

Optional: value is a title for the page. Can be the value of the HTML <title> element, or something else.

4.2.7.3 <description>

Optional. A description of the page. In case of a weblog, this could hold the post.

4.2.7.4 <occurrence>

Zero or more. Occurrences indicate a topic occurs on this page.

Example:

<occurrence topicid="some_other_topicid_inthismap" strength="1" />

Attributes:

4.2.7.4.1 topicid

The id of a topic in this map. Indicates that this topic occurs on this page.

4.2.7.4.2 strength

See occurrence strength concept. A whole number from 1 to infinity. The lower the number, the higher the confidence in the indexing.

5. Processing instructions

5.1 Implementation checklist

A checklist for developers implementing XFML compatible applications is available at http://xfml.net/index.php?page=ImplementationChecklist

5.2 General processing suggestions

When a connection between two topics in two different maps has been indicated, lots of automated processing becomes possible. The following are some examples and recommendations. Software designers can of course ignore them, or come up with their own rules.

Consider mapA and topicAa, and mapB and topicBb. In mapA, topicAa has been connected to topicBb.

5.2.1 Importing occurrences. An application can import occurrences of topicBb and make them occurrences of topicAa. This can be done on a regular basis. The choice could be given to the map author to import only occurrences of pages already in his map, or import new pages as well. Occurrencestrength for these imported occurrences should be (at least) one higher in mapA than they were in mapB, since you will probably have less confidence in this indexing. Difference with XTM: note that processors are NOT expected to go out and download XFML maps referred to with <connect> - a different behaviour from the similar <topicRef> element in XTM.
5.2.2 Downloading maps. Since maps can get quite large, avoid hitting them every day. Use the nextUpdate attribute to see when you should next download the map.
5.2.3 Importing psi's. PSI's of topicBb can be added to topicAa, since the topics have been defined to be equal.
5.2.4 Debugging maps. Within a map, two topics cannot be the same, so they also cannot have a psi that is the same. They also shouldn't have the same name. To debug maps, use rules in the connecting topics concept and the published subject indicator concept.
5.2.5 Traversing a map's network. See map network concept. If you connect topicAa to topicXx, then mapX is in the network of mapA. (Not nessecarily the reverse). The application could follow connected topics of topicBb, which might lead to topicCx (in mapC), which in turn could lead to topicDa (in mapD) and so on. Network errors can occur at this stage, in which case the decision of using a certain connection should be left to the user. A network error can occur when in mapA, topicAa = topicBb (the equal sign means "is connected"), in mapB topicBb = topicCx, but in mapA topicAx = topicCx. This would mean that in mapA, topicAa = topicAx. This is a network error that can't be resolved by machines, so should be presented to the map administrator.
5.2.6 Automatic discovery of equal topics in another map. If given a map, an application can try to find equal topics: are any of the topics connected to a topic in our map? To topics in other maps that our topics are also connected to? Do any topics have the same name? Human review of the discovery results is important.
5.2.7 Automatic discovery of maps. Use the link tag (see indicating XFML documents) to find XFML documents. If that doesn't work, try to find a link to a file called xfml.xml.
5.2.8 Occurrence strength. Automatic indexing engines should probably be assigned a higher occurrencestrength, since they will be less reliable than manual indexing. Interfaces may let users assign different increases in occurrencestrenght for occurrences imported from other maps: a user may be able to indicate: increase the occurrencestrength of occurrences that are automatically imported from that map with 3 because I have less confidence in them.

5.3 Publishing XFML

In general, the mimetype for a published XFML document (when accessed over HTTP) should be text/xml. This allows Web browsers to use XML formatting conventions to display an XFML document. Alternatively you can use application/x-xfml+xml, a non registered mimetype. The extension we recommend is .xfml. Alternatively you can use .xml.

5.4 XFML Core compatible

An application can call itself "XFML Core compatible" if it follows this spec. There is a logo available to indicate Core XFML Core compatible applications, feel free to copy it and link it to this spec: (image generously provided by Bryan Bell)

5.5 Indicating XFML documents

The availability of one or more XFML documents on a website can be indicated by this button, linking to the XFML document. In case of multiple XFML documents, you can have multiple buttons, although a page explaining the differences would be nice: (Thanks to Jonathan for the button)

For purposes of autodiscovery, available XFML documents should also be indicated in the HTML head as follows, preferably on every page of the website: (thanks to Mark Pilgrim for the idea)

<link rel="alternate" type="application/xfml+xml" title="XFML" href="url/to/xfml/file">

or for XHTML:

<link rel="alternate" type="application/xfml+xml" title="XFML" href="url/to/xfml/file" />

Note that the title is for display purposes only. Multiple link tags can be used in case of multiple XFML documents.

5.6 Compatible formats

XFML can be expressed in other formats like RDF or XTM. How exactly you express this depends on how you want to use the data. A list of formats and recommendations for converting between XFML and other formats are available at http://xfml.org/software.html#compatibleformats

5.7 Resolving relative URI's

We recommend all URI's to be absolute in an XFML map. If you encounter relative URI's, all URIs in <psi> and <connect> and <page url="..."> are supposed to be resolved relative to <xfml url="...">.

6. License

XFML Core was created by Peter Van Dijck. It is permanently licensed to the public: it can be freely used and distributed by anyone, and this right will never be revoked. Anyone may create further versions (with version numbers higher than 1.0) of XFML, as long as they include a reference to this page in their spec.

This document and the information contained herein is provided on an "AS IS" basis and XFML.org and Peter Van Dijck DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.