Office Open XML
From Wikipedia, the free encyclopedia
Categories: Semi-protected | All articles with unsourced statements | Articles with unsourced statements since March 2008 | XML-based standards | Markup languages | Open formats | Computer file formats | Microsoft Office | Markup language comparisons | Ecma standards | ISO standards
|
Not to be confused with OpenOffice.org XML (a deprecated format used by earlier versions of Open Office,) Microsoft Office XML formats (a deprecated format used by earlier versions of Microsoft Office,) or OpenDocument (the XML file format used by the OpenOffice office suite).
Office Open XML (often referred to as OOXML or as OpenXML) is an XML-based file format specification for electronic documents such as spreadsheets, charts, presentations and word processing documents. Microsoft originally developed the specification as a successor to its binary Microsoft Office file formats. The specification was later handed over to Ecma International to be developed as the Ecma 376 standard, under the stewardship of Ecma International Technical Committee TC45. Ecma 376 was published in December 2006[1] and can be freely downloaded from Ecma international.[2]
BackgroundPrior to the 2007 edition of Microsoft Office, its component applications (such as the word-processor Word and spreadsheet Excel) used binary file formats for storing data by default. Historically, these formats have been difficult for developers to work with natively, due to a lack of publicly available information on, and royalty-free access to, the format specifications. (Microsoft does offer a subset of these binary format specifications under a royalty-free covenant not to sue.[3]) While a level of support for the binary formats had been achieved by various applications, full interoperability remained elusive.[citation needed] In 2000, Microsoft released an initial version of an XML-based format for Excel, which was incorporated in Office XP. In 2002, a new file format for Microsoft Word followed.[4] The Excel and Word formats - known as the Office 2003 XML formats - were later incorporated into the 2003 release of Microsoft Office. In 2004, governments and the European Union recommended to Microsoft that they publish and standardize their XML Office formats through a standardization organization.[5] Microsoft announced[6] in November 2005 that it would standardize the new version of their XML-based formats through Ecma, as "Ecma Office Open XML." File format and structureIn the earlier form of these formats, prior to Ecma standardization, the Microsoft Office 2003 XML formats used a single monolithic file with embedded items like pictures as binary encoded blocks within the XML. Office Open XML no longer supports those but uses a file package conforming to the Open Packaging Convention. This format uses the ZIP file format and contains the individual files that form the basis of the document. In addition to Office markup, the package can also include embedded (binary) files in formats such as PNG, BMP, AVI or PDF. Document markup languagesAn Office Open XML file may contain several documents encoded in specialized markup languages corresponding to applications within the Microsoft Office product line. Office Open XML defines multiple vocabularies (using 27 namespaces and 89 schema modules.) The primary markup languages are:
Shared markup language materials include:
In addition to the above markup languages custom XML schemas can be used to extend Office Open XML. The XML Schema of OOXML emphasizes reducing load time and improving parsing speed. In a test with applications current in April 2007, XML based office documents were slower to load than binary formats.[7] To enhance performance, OOXML uses very short element names for common elements and spreadsheets save dates as index numbers (starting from 1899 or from 1904). In order to be systematic and generic, OOXML typically uses separate child elements for data and metadata (element names ending in Pr for properties) rather than using multiple attributes, which allows structured properties. OOXML does not use mixed content but uses elements to put a series of text runs (element name r) into paragraphs (element name p). The result is terse and highly nested in contrast to HTML, for example, which is fairly flat, designed for humans to write in text editors and is more congenial for humans to read. OMMLOffice Math Markup Language is a mathematical markup language which can be embedded in WordprocessingML, with intrinsic support for including word processing markup like revision markings [8] , footnotes, comments, images and elaborate formatting and styles.[9] The OMML format is different from the World Wide Web Consortium (W3C) MathML recommendation that does not support those office features, but is partially compatible[10] through relatively simple XSL Transformations. DrawingMLImage:DrawingML text effect.png
Example of DrawingML text effects
DrawingML is the graphics markup language used in OOXML documents. Its major features are the graphics rendering of text elements, graphical vector based shape elements, graphical tables and charts. The DrawingML table is the third table model in Office Open XML (next to the table models in WordprocessingML and SpreadsheetML) and is optimized for graphical effects and its main use is in presentations created with PresentationML markup. DrawingML contains graphics effects (like shadows and reflection) that can be used on the different graphical elements that are used in DrawingML. In DrawingML you can also create 3d effects, for instance to show the different graphical elements through a flexible camera viewpoint. It is possible to create separate DrawingML theme parts in an Office Open XML package. These themes can then be applied to graphical elements throughout the Office Open XML package.[11] DrawingML is unrelated to the other vector graphics formats such as SVG. These can be converted to DrawingML to include natively in an Office Open XML document. This is a different approach to that of the OpenDocument format, which uses a subset of SVG, and includes vector graphics as separate files. Container structureOffice Open XML packages have characteristically different directory structures and names depending on the type of document. An application will use the relationships files to locate individual sections (files), with each having accompanying metadata, in particular MIME metadata. Office Open XML format uses a ZIP package for storing XML and other data files.[12] A basic package contains an XML file called [Content_Types].xml at the root, along with three directories: _rels, docProps, and a directory specific for the document type (for example, in a .docx word processing package, there would be a word directory). The word directory contains the document.xml file which is the core content of the document.
RelationshipsRelationship files in Office Open XMLAn example relationship file (from word/_rels/document.xml.rels)
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<Relationships
xmlns="http://schemas.microsoft.com/package/2005/06/relationships">
<Relationship Id="rId1"
Type="http://schemas.microsoft.com/office/2006/relationships/image"
Target="http://en.wikipedia.org/images/wiki-en.png"
TargetMode="External" />
<Relationship Id="rId2"
Type="http://schemas.microsoft.com/office/2006/relationships/hyperlink"
Target="http://www.wikipedia.org"
TargetMode="External" />
</Relationships>
As such, images referenced in the document can be found in the relationship file by looking for all relationships that are of type Hyperlink relationsThe following code shows an example of inline markup for a hyperlink: <w:hyperlink w:rel="rId2" w:history="1"> In this example, the URL is represented by "rId2". The actual URL is in the accompanying relationships file, located by the corresponding "rId2" item. Linked images, templates, and other items are referenced in the same way. Embedded or linked media file relationsPictures can be embedded or linked using a tag: <v:imagedata w:rel="rId1" o:title="example" /> This is the reference to the image file. All references are managed via relationships. For example, a document.xml has a relationship to the image. There is a _rels directory in the same directory as document.xml, inside _rels is a file called document.xml.rels. In this file there will be a relationship definition that contains type, ID and location. The ID is the referenced ID used in the XML document. The type will be a reference schema definition for the media type and the location will be an internal location within the ZIP package or an external location defined with a URL. LicensingEcma International provides specifications that "can be freely copied by all interested parties without restrictions"[13] and under the Ecma code of conduct in patent matters which requires participating and approving member organisations to make available their patent rights under a reasonable and non-discriminatory basis (see Reasonable and Non Discriminatory Licensing). These "reasonable and non-discriminatory" are common minimum patent conditions for a standard. International standardization adheres to a clear preference for royalty-free patent licensing. That is why Microsoft, which is a main contributor to the standard, provided a Covenant Not to Sue[14] for its patent licensing. The covenant received a mixed reception, with some (like Groklaw) identifying problems[15] and others (such as Lawrence Rosen) endorsing it.[16] Microsoft also added the format to their Open Specification Promise[17] in which Microsoft irrevocably promises “not to assert any Microsoft Necessary [Patent] Claims against you for making, using, selling, offering for sale, importing or distributing any implementation to the extent it conforms to a Covered Specification ("Covered Implementation")”. The format can therefore be used under the Covenant Not to Sue or the Open Specification Promise. In support of the licensing arrangements Microsoft commissioned an analysis from the London legal firm Baker & Mckenzie.[18] The Open Specification Promise was included in documents submitted to ISO in support of the Ecma 376 fast track submission".[19]. In response to criticism of the licensing, ECMA provided the following statements:[20]"
But the Software Freedom Law Center has warned of problems with the Open Specification Promise for open source software projects. In a published analysis of the promise it states the promise should not be relied upon because:
|


