Validators and compliance testing

The ODF standard was designed to define a universal syntax (a shared ‘formula’) for producing valid documents. Being fully predictable in what an application stores for itself or shares with others is what makes it possible for other applications to subsequently handle that content in the same way as the user expects to happen. A shared syntax is intended to allow any application to unambiguously understand what it may expect inside any valid document it runs into. It is never a good thing for interoperability if a vendor allows his products to stray too far from the path of a standard, because the onus is on all other applications to correctly deal with an increasing number of violations of the standard. This is expensive and impractical to do correct in the here and now, but may be impossible in the future. The fact that things currently seem to work in some circumstances is not a good excuse for not doing what is agreed upon.

Office applications are complex and like most software you work with they may not be perfect. Luckily, most applications are not too fragile when it comes to dealing with small imperfections. Not every deviation from the written standard will crash your application or make you lose part of your content. This capability resembles our own human capabilities for understanding text. In our daily speech we run into incomplete sentences and information we do not understand regularly, and yet we are in most cases capable of correctly parsing their meaning or skipping unknown parts without a mental breakdown. When a certain small piece of information that is technically required at a certain point in a document is missing or misplaced according to the rules, that information may for instance be redundantly available elsewhere or be obvious from the larger context of a document. The same goes for the capability of applications to skip information elements that technically shouldn’t be in the document but nonetheless are – applications may be able to discard invalidly formulated information they can’t understand. Of course there is a predictable friction between these two strategies. There are limits to what an application can or should ‘guess’ if it runs into invalid documents.

It is not really possible to run an external check against productivity software itself directly to see if its programmers adhered to a specific standard. We need therefore need to look at the output of applications with a so called *validator*. Validators are somewhat rigid pieces of software that try to check if a given document formally complies with a specification (in this case: the Open Document Format standard as specified by the standards committee). There are a number of ways in which validators can perform such checks. The most common approach is relatively straightforward, as it is built into the standards process by the OASIS ODF TC: the ODF standard is not just produced in a text version for humans to read, but also partly delivered in a machine processable way by means of another standard called RelaxNG. By matching a document against the constraints set by this schema, we can automate many checks.

There are requirements that are not possible to describe in this rule syntax. These may still be captured in an automated way, but require dedicated programming to do so. There is no single validator that covers all conformance criteria of the OpenDocument Format specification. There are a number of validators available, and given the complementary nature of their checks these are recommended to be used in parallel. OpenDoc Society has sponsored an online validator based on the ODF Toolkit, which can be used (and downloaded) for free at https://odfvalidator.org. There are also other validators that you can download yourself to use in your own environment, such as Cyclone3 and Office-o-Tron. In addition you can use the RelaxNG schemata directly, using one of the many RelaxNG validators.

Validators report problems with very reasonable likelihood that a given document – and by extension the application (or combination of applications) that produced it – may not conform to the specification. It must not be concluded from errors that are reported by a validator alone that the document does not conform to the specification without further investigation of the error report, and it must not be concluded from the absence of error reports that a document conforms to the OpenDocument Format specification. No software is perfect – including validators and the software libraries these are built with. That said, running a set of representative documents through one or more validators and reviewing the output is certainly an important part of gaining adequate understanding about the application landscape.

Officeshots

Another tool that may be helpful for you instead of using standalone validators, is called Officeshots. It is open source tool you could use internally within your own organization. Officeshots.org originated as a joint project by OpenDoc Society and the Netherlands government. When you are in an acquisition phase, officeshots.org can help you to do a reality check if that fancy new open source suite or that productivity package that is offered in a tender a bargain prices – actually does what it says.

Officeshots will allow you to compare the output and other behavior of a wide variety of applications. Do the templates produced by your communications department or design agency – which will be the technical model for possibly thousands or many more documents – actually look consistent across the board of applications? How would they look if you would view or print them in EuroOffice 2014, different versions of Microsoft Office going back to Microsoft Office 2007, or the engine used by Google Docs? Both the documents you provide to Officeshots as well as the results are validated in a number of validators automatically.

After submitting a document to officeshots.org, the site will deliver the print, screen and code output as produced by a variety of productivity applications – in different versions and across operating system platforms. This visual inspection is actually quite important, because just like an empty book doesn’t have any spelling errors a document that has lost its content will probably pass through a validator just fine. Officeshots is not just for interactive use, it can also be used to automatically run sets of test documents (so called test suites) and analyze their output.

Officeshots.org is a community project, so it is open to add rendering servers and/or test suites to compare the solutions you are looking at and share the results with the community. If you do not have the technical skills to do so yourself, consider asking a colleague who can (or sponsoring an SME to do it for you). It really helps vendors to prioritize fixing interoperability issues if their flaws are publicly visible.

For more information visit:

https://nlnet.nl/project/officeshots/

To download the source code and documentation:

https://gitlab.com/odfplugfest/officeshots