Beyond My Mind

March 22, 2007

Deciphering Microsoft Office 2007 Bibliography Format

Filed under: Research — mahbub @ 12:56 am

I am about to write a module for JabRef, an open source bibliographic management software to export the bibliographic information for Microsoft Office 2007.

Some references that might be helpful:

  1. How to use Office 2007 bibliographic tool
  2. OpenXML Developer
  3. Blog of Brian Jones, the person behind the Office 2007 open XML
  4. ECMA Open XML Standard Elaborated Schemas (all documents)
  5. MSDN article showing how to work with Bibliography (updated March 23, 2007)

But after searching for a day, I could not find a single web page describing the exact or near exact format for bibliographic information in Microsoft Office 2007. So I started digging in myself.

I started adding some bibliographies in Microsoft Office Bibliography Editor. The very first thing I noticed is, if you add some references and don’t use them in the document they are not going be saved. If you use one or more of them in your document, all of them will be saved in “C:\Documents and Settings\<USER>\Application Data\Microsoft\Bibliography\Sources.xml“. I opened the XML file and here’s what I got (figure 1).

Figure 1: Mircosoft Office 2007 Bibliographic Database Format

Content of Microsoft Office 2007 Bibliographic Source XML

Obviously I had only one bibliographic source in the “Sources.xml”. I was almost certain that Office will import a copy of this file without any problem. A copy of this file with the information altered and GUID, LCID deleted, just worked as imported bibliography. But wait, where are my previous bibliographic sources?

So I tried to discover what happened and found that Office does NOT really imports bibliography into the “Sources.xml”, it allows you to work on currently opened XML only. All the bibliographic sources in currently opened bibliographic XML file are displayed in the ‘master list‘. You have to “copy” them into your ‘current list‘ to work with it. If you want to merge information from an external XML file into your “C:\Documents and Settings\<USER>\Application Data\Microsoft\Bibliography\Sources.xml” you have to open the external XML file, copy the information into your ‘current list‘, open the “Sources.xml” again and then copy them back into the ‘master list‘ which now points to “Sources.xml“.

I wanted to find out the least possible information required for the XML file to be recognized as a valid bibliographic source by Office 2007. The bare minimum is:

<sources xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"/>

If you want to add information in this base minimum XML don’t use the “b:” tag.

From MSDN (update):

The Guid and LCID elements are optional, but you can provide values for them if you want. The Guid element value should be a valid GUID, which you can generate programmatically outside the Word object model. (See the Microsoft Visual Studio documentation or the Microsoft Windows documentation on MSDN for information about programmatically generating ID.) Word generates GUIDs when users add or edit a source. If you do not add a GUID to the XML and a user then edits a source, Word generates a GUID. This enables Word to determine which source is most recent, based on the value of the GUID, and to prompt whether the user wants Word to update the outdated source to maintain continuity between the master list and the current list.

The LCID specifies the language for the source. (See MSDN for valid language identification values.) Word uses the LCID to know how to display a cited source in a document’s bibliography. For example, one source may be written in French, one in English, and one in Japanese. From the LCID, Word determines how to display names (for example, Last, First for English), what punctuation to use (for example, using comma in one language and a semicolon in another), and what strings to use (for example, whether to use “et al” or another localized form).

Now that I deciphered how bibliographic information can be presented in an XML, so that Office 2007 recognizes it as a bibliographic source, I can now list down all the bits and pieces that can go inside it. Please follow my next post on it.

7 Comments »

  1. I posted this already under another post of your blog but it may be useful here as well: Bibutils has added initial support for conversion of bibliographic formats to Word 2007 bibliographic XML format, which may be useful in figuring out your own export module (or might be even used instead of writing your own converter).

    http://www.scripps.edu/~cdputnam/software/bibutils/

    Comment by Matthias — March 26, 2007 @ 5:27 am

  2. [...] GUID: Global ID. This enables Word to determine which source is most recent, based on the value of the GUID, and to prompt whether the user wants Word to update the outdated source to maintain continuity between the master list and the current list. Example: {F3BEFB3B-FC0D-47AB-970A-F4003FF99F9F} (more) [...]

    Pingback by Details of Microsoft Office 2007 Bibliographic Format Compared to BibTex « Beyond My Mind — March 30, 2007 @ 2:09 am

  3. Thanks Matt. I will take a look at your source.

    Comment by mahbub — March 30, 2007 @ 2:17 am

  4. Is there a possibilty to include the chapter number in the bibliographic numbering scheme?
    E.g. [1.1] Author, “Title of work”, etc… resp. the reference [1.1] in the text.

    Another question… probably wrong to place it here, but anyhow: I have the German version of Office 2007. How can I make sure that Words uses the translated word “Bibliography” in the title of the bibliography instead of the German “Literaturverzeichnis”?

    Thanks!

    Andrej

    Comment by Andrej — August 18, 2008 @ 1:26 pm

  5. Is this the plugin that’s now shipped with JabRef 2.4? If so, could you post a usage HOWTO?

    Thanks,

    Scott

    Comment by scotto — September 18, 2008 @ 5:13 pm

  6. Is there any way to export Microsoft word sources.xml file as a bibtex file?!

    Comment by masoud moshref — January 6, 2009 @ 5:45 am

  7. visit us!
    newsbox.cc
    newsbox.us
    nbstatus.wordpress.com
    NOW!

    Comment by WeetteRes — June 11, 2009 @ 7:07 pm


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.