I am about to write a module for JabRef, an open source bibliographic management software to export the bibliographic information for Microsoft Office 2007.
Some references that might be helpful:
- How to use Office 2007 bibliographic tool
- OpenXML Developer
- Blog of Brian Jones, the person behind the Office 2007 open XML
- ECMA Open XML Standard Elaborated Schemas (all documents)
- MSDN article showing how to work with Bibliography (updated March 23, 2007)
But after searching for a day, I could not find a single web page describing the exact or near exact format for bibliographic information in Microsoft Office 2007. So I started digging in myself.
I started adding some bibliographies in Microsoft Office Bibliography Editor. The very first thing I noticed is, if you add some references and don’t use them in the document they are not going be saved. If you use one or more of them in your document, all of them will be saved in “C:\Documents and Settings\<USER>\Application Data\Microsoft\Bibliography\Sources.xml“. I opened the XML file and here’s what I got (figure 1).
Figure 1: Mircosoft Office 2007 Bibliographic Database Format
Obviously I had only one bibliographic source in the “Sources.xml”. I was almost certain that Office will import a copy of this file without any problem. A copy of this file with the information altered and GUID, LCID deleted, just worked as imported bibliography. But wait, where are my previous bibliographic sources?
So I tried to discover what happened and found that Office does NOT really imports bibliography into the “Sources.xml”, it allows you to work on currently opened XML only. All the bibliographic sources in currently opened bibliographic XML file are displayed in the ‘master list‘. You have to “copy” them into your ‘current list‘ to work with it. If you want to merge information from an external XML file into your “C:\Documents and Settings\<USER>\Application Data\Microsoft\Bibliography\Sources.xml” you have to open the external XML file, copy the information into your ‘current list‘, open the “Sources.xml” again and then copy them back into the ‘master list‘ which now points to “Sources.xml“.
I wanted to find out the least possible information required for the XML file to be recognized as a valid bibliographic source by Office 2007. The bare minimum is:
If you want to add information in this base minimum XML don’t use the “b:” tag.
From MSDN (update):
The Guid and LCID elements are optional, but you can provide values for them if you want. The Guid element value should be a valid GUID, which you can generate programmatically outside the Word object model. (See the Microsoft Visual Studio documentation or the Microsoft Windows documentation on MSDN for information about programmatically generating ID.) Word generates GUIDs when users add or edit a source. If you do not add a GUID to the XML and a user then edits a source, Word generates a GUID. This enables Word to determine which source is most recent, based on the value of the GUID, and to prompt whether the user wants Word to update the outdated source to maintain continuity between the master list and the current list.
The LCID specifies the language for the source. (See MSDN for valid language identification values.) Word uses the LCID to know how to display a cited source in a document’s bibliography. For example, one source may be written in French, one in English, and one in Japanese. From the LCID, Word determines how to display names (for example, Last, First for English), what punctuation to use (for example, using comma in one language and a semicolon in another), and what strings to use (for example, whether to use “et al” or another localized form).
Now that I deciphered how bibliographic information can be presented in an XML, so that Office 2007 recognizes it as a bibliographic source, I can now list down all the bits and pieces that can go inside it. Please follow my next post on it.