Skip to main content

Introduction to OpenDocument Format


Introduction

This document serves as an easy and simple introduction to OpenDocument Format (ODF) and its origins. The material in this paper is a compilation of the freely available information on the Web from various blogs, news feeds, and mailing lists. Many thanks are due to Pamela Jones and Groklaw 1 for collecting and archiving ODF articles, news, and much of the content contained in this document. For a more in-depth study of ODF, additional resources are available the end of this document.

What is ODF?

OpenDocument Format is an open, standard file format for saving and exchanging editable office documents such as text documents (e.g., memos, reports, and books), spreadsheets, charts, and presentations. This standard was developed and is maintained by the Organization for the Advancement of Structured Information Standards (OASIS) 2 consortium, using the XML-based file format originally created by OpenOffice.org as its foundation.

The standard was developed in an open, collaborative process by a variety of organizations, is available publicly, and can be implemented by anyone. ODF is intended to provide an alternative to proprietary document formats, including the popular DOC, XLS, and PPT formats used by Microsoft? Office?, as well as Microsoft Office XML format. Organizations and individuals that store their data in an open format that can be supported by any vendor, avoid being locked in to a single software vendor. This leaves them free to switch software if their current vendor goes out of business or changes their software or licensing terms to something less favorable.

The chronology of the creation to standardization of ODF is as follows:

  1. OpenOffice.org creates an open file format called "OpenOffice.org 1.0 format."
  2. The European Union commissions Valoris to report on open file formats.
  3. The OpenOffice.org 1.0 format is submitted for OASIS standardization. KDE and Corel join the OASIS Technical Committee and expand the format to cover a wider range of applications.
  4. The new OASIS format is called "Open Office XML." OpenOffice.org and KOffice both commit to making the format their primary/native format. (May 1, 2005)
  5. The Valoris report is published. Microsoft and Sun respond to it. The European Union Telematics between Administrations Committee (TAC) makes recommendations.
  6. The format is submitted for ISO standardization and changes its name to OpenDocument Format (ODF).
  7. ODF v1.0 was approved as ISO/IED 26300 on May 3, 2006

Who is supporting ODF?

The open source community has long embraced ODF as the office document standard. Various applications, both open source and proprietary, are supporting ODF as their native format. OpenOffice.org 2.0 (multiplatform) and KOffice 1.5 (Linux) are two such examples. IBM Workplace? also supports and uses ODF as its native document format, meaning that it can directly read/write without any filters. Other examples include Sun's StarOffice and Google's Writely, a browser-based office document application that supports ODF through the use of the OpenOffice.org engine.

The list of applications and vendors that either already support or will support ODF is growing in number and importance (ODF Application List). 3 Some hardware vendors (cell phone and hand-held devices) are moving quickly to support this increasingly popular document format. Various open source projects have already started and tools are being developed to bridge the gap between legacy formats and ODF (OpenDocument Fellowship). 4 The ODF Alliance 5 is a non-profit organization dedicated to educating policymakers, IT specialists, and the public on ODF, its benefits, and its future. The Alliance was launched on March 3, 2006 with more than 35 initial members from a wide range of countries around the world. IBM and Sun Microsystems were founding members of the Alliance, which has grown in a few short months to include more than 300 members, including Google and others. Membership in the Alliance is open to all organizations that are committed to its mission.

Who is using ODF?

The origin of the ODF revolution can be attributed to the state of Massachusetts, which announced on September 21, 2005 that it intended to adopt an open standard document format for the government. The standardization of the specification as ISO/IED 26300 has accelerated the acceptance of ODF globally.

There is a growing community of the public sector, government, and organizations that are adopting policies to migrate to or implement ODF in their respective entities and integrate ODF into their daily activities. The list includes Belgium, Denmark, the European Union, the United Kingdom, the World Trade Organization, the City of Munich (Germany), the City of Bristol (UK,) and the French Police. The OpenDocument Fellowship maintains the list of early-adopters. 6

What is under the hood of ODF?

ODF is XML, written with the goal of maximizing ease of understanding. The specification is an easy-to-read Relax-NG (Regular Language for XML Next Generation) schema, and it builds on existing standards that you may already be familiar with, like XLink, MathML, SVG (Scalable Vector Graphics), XForms, and SMIL (Synchronized Multimedia Integration Language). It also uses syntax inspired by XHTML. The complete specification can be obtained from OASIS (ODF Spec). 7

Get Adobe?Reader?

The most common file extensions used for ODF documents are: .ODT for text documents, .ODS for spreadsheets, .ODP for presentations, and .ODG for graphics. The OpenOffice.org database format (.ODB) is not part of the specification.

An OpenDocument file can be either a simple XML file that uses as the root element, or a JAR (ZIP) compressed archive containing a number of files and directories. The JAR-based format is used almost exclusively, since it can embed binary content and tends to be significantly smaller. The exact files and directories in the archive will depend on the content of the document (e.g., images, macros, etc.). A typical document, when unzipped, will have the following contents:

./Configurations2/		directory
./META-INF/			directory
	manifest.xml
./Pictures/			directory
./Thumbnails/			directory
	thumbnail.png
content.xml
meta.xml
mimetype
setting.xml
styles.xml

The PDF version of the ODF specification can be obtained from OASIS. The most common files are briefly explained here:

  • content.xml - This is the most important file. It carries the actual content of the document, except for binary data, like images. The base format is inspired by HTML, and though more complex, is reasonably easy to understand.
  • manifest.xml - The manifest file contains a list of all the files in the ZIP archive. The presence of a manifest means that OpenDocument files are also JAR archives.
  • meta.xml - This file contains the file metadata. For example, author, "last modified by", date of last modification, etc.
  • mimetype - This is a one-line file containing the MIME type of the file. For example, for a text document it would be: "application/vnd.oasis.opendocument.text".
  • Pictures/ - This is a directory that contains images in common image formats such as JPEG and PNG. They are referenced from content.xml in a way similar to the <img> tag in HTML.
  • setting.xml - This includes settings such as the zoom factor or the cursor position. These are properties that are not content or layout.
  • styles.xml - This file contains style information. Styles include things like font size, color, page width, and any kind other of formatting. OpenDocument provides a strong separation between content (in content.xml) and formatting (in styles.xml). In OpenDocument all formatting is done through styles. Even "manual" formatting is implemented through styles; the application dynamically makes new styles as needed. Style types include:
    1. Paragraph styles.
    2. Page Styles.
    3. Character Styles.
    4. Frame Styles.
    5. List styles.
  • thumbnail.png - An image of the first page of the document to be used as a "thumbnail" view.

Why ODF is important?

The most important aspect of ODF is its openness. The specification of the format and its syntax and semantics of the XML tags are open to inspection by anyone. Any interested individual or vendors can develop tools of their liking to create, view, and edit ODF documents. ODF is developed in an open, collaborative process that can not be controlled by a single vendor. Interested parties are encouraged to participate in the improvements of the format for future releases.

ODF's standards and openness also guarantee the longevity of the documents created using the format. This feature of ODF is extremely attractive to the public sector, where there are legal requirements for document retention. Having the documents in ODF does not imply that one has to use a specific vendor product to be able to access the document.

The standardization of ODF is opening a new era in office document innovation and is shifting the value back to the content, rather than the application used to create or view the content. In addition, multiple vendor support and existing applications from different sources can be used as checks and balances to ensure that the full spectrum of the specification is utilized and complied with.

The following list highlights some of the popular and important aspects of ODF. ODF is:

  1. An international standard, ISO/IED 26300.
  2. An open source specification based on XML.
  3. Non-binary and cross-platform.
  4. Widely-adopted and enjoys multiple vendor support.
  5. Supported natively by various applications, both open source and commercial.

In addition, ODF presents the following outstanding technical merits:

  1. Preserves format fidelity: This refers to both presentation and structure in the sense that neither changes depending on a particular application. For many applications, format fidelity is an absolute imperative.
  2. Modifiable: Document exchange is a very common practice. However, lack of a common standard for office documents hindered this effort. One solution was the Portable Document Format (PDF) from Adobe, which allowed the documents to be printed in that electronic format for exchange. PDF is basically a non-editable document format, barring the new form components. On the other hand, the use of an open standard like ODF allows exchange of modifiable documents, increasing electronic collaboration.
  3. Supports current word processor features: An open format is of no use if it can't represent your data. This list included Unicode support, bi-directional (Hebrew and others), and scripting, among others.
  4. Supports emerging requirements: Digital signatures, access rights, version control, etc.

What about ODF and accessibility?

In reality, ODF accessibility is an emerging feature, a topic of much discussion in the community, and an area in need of further investigation and contribution. The need for a viable office document format was so great that ODF became the standard despite its shortcomings in the area of accessibility. However, it should be noted that most of the underlying XML components of ODF have already gone through the W3C's Web Accessibility Initiative processes.

Considering the access needs of people who are visually impaired, the accessibility of any content depends on a three-layer architecture and well understood and publicized interfaces between the layers: the content format specification, the application that is handling the content, and the assistive technology (AT) which is presenting the application and its content accessibility features to a user. Each layer plays an important role in making any document accessible:

  1. One way that ODF incorporates accessibility is its versatility. The format specification allows a full range of the required and/or preferred XML elements and attributes to be included in a document that, in essence, enables a document for accessibility on any platform.
  2. The applications that are used to read or write ODF documents should honor and comply with the full specification, and hence expose the accessibility features of the underlying format through a standard application programming interface (API) — such as Microsoft Active Accessibility or Linux GTK/ATK/AT-SPI) — to the operating system (OS) services. In addition, as authoring tools, they should allow the authors to specify the accessibility information along with the document content.
  3. AT applications and devices should then use the API and OS services to present the full range of document accessibility features to the end users.

The OASIS ODF task force formed a subcommittee to look into ODF accessibility. The initial task was to review the ODF specification and identify accessibility issues. The subcommittee has identified nine accessibility issues in the ODF 1.0 specification. These issues are explained in detail in a blog 8 by Peter Korn, the accessibility architect for Sun. The subcommittee has released ODF v1.1, which includes the new accessibility enhancements to resolve the issues, for balloting.

As for the authoring tools, OpenOffice.org, KOffice by KDE, and IBM Workplace, among others, are continuing the work on application accessibility and exposing all the available accessibility features of the underlying format to the OS. In addition, there are parallel efforts by various AT vendors to increase their compatibility with applications that use ODF natively.

ODF resources

1. Groklaw - http://www.groklaw.net/
2. OASIS - http://www.oasis-open.org/home/index.php
3. Open Application List - http://en.wikipedia.org/wiki/List_of_applications_supporting_OpenDocument
4. OpenDocument Fellowship - http://opendocumentfellowship.org/
5. ODF Alliance - http://www.odfalliance.org/
6. List of early adopters - http://opendocumentfellowship.org/government/precedent
7. ODF specification - http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf
8. Peter Korn's blog - http://blogs.sun.com/roller/page/korn/20060526