Integration with other enterprise tools

Documents are often part of a larger information lifecycle within an organisation or even across organisations. In many cases the use of productivity applications involves some kind of pre-processing (assembly, conversion) or post-processing (e-signing, annotation) with other enterprise tools. One has to carefully design how such integration is done, in order to most benefit from the use of open standards like ODF.

Before you start redesigning your integration, make sure that you actually still need an office application for the task at hand to begin with. Office applications are complex tools mainly intended for desktop usage and they add a lot of heavy-weight packaging to your content in order to satisfy requirements that might have no relevance in a specific context. Office appliations also allow your users to make many unnecessary mistakes and let information get lost or misplaced more easily.

If your use case for involving an office application into the process revolves around the need to have rich text editing, charting or adding corporate branding, it is quite likely easier and more convenient to provide that through other means. There are dedicated software components that allow you to integrate office viewing and editing cabilities into your own software with high fidelity. Libraries such as the open source Calligra Engine, Aspose natively work with ODF and allow easy integration into desktop, server and server side web applications. The open source WebODF framework is built around ODF viewing and editing within client side web applications.

Historically, many organisations have procured very expensive systems in order to assemble documents which are trivial to produce in OpenDocument Format. A migration to ODF for your organsiation offers new possibilities to reduce the dependency on such expensive systems.

Don’t forget that you can also split up procedural steps in order to be able to automate them. There are many software libraries capable of easily processing documents from different sources, enabling you to assemble and manipulate the content in many different ways. Most offer a high-level convenience functionality requiring little knowledge of ODF, in case you have special needs the simple structure of ODF files and its open specification allow very finegrained tuning of any part of a document. If your users are currently performing repetitive tasks inside an office application or using macros, likely you are able to automate that work outside of the office application in a safer, cheaper and more reliable way. And if you are manually compiling reports or presentations from business data, you will likely be able to automate that as well.

General design considerations for integration:

  • Don’t move your information into an office file format too early. If possible you should try to keep structured information accessible in an automated way, preferable as close as possible to the original source of the data. Ideally there is just one authoritative source where all information that belongs together is maintained. If pretty much the entire contents of a certain class of documents starts its life in an enterprise system as text in a database, it often makes sense to bring information back to that system (or as close possible to it in the information pipeline) rather than scattering that information in separate standalone documents stored in a file system. If you modify, delete or add information manually inside an office application, it is much harder to keep an overview at a systemic level.

  • Use the tooling that ODF offers. Add metadata about the origins of content components as RDF. If a user modifies a document generated by an application, use version control (change tracking) to be able to see what users did later on. That will allow you to scan for errors as well as other integrity issues. Add a digital signature to a document if the integrity of information contained in it is important. Place scripting or placeholder instructions in a logical place, preferable inside a user variables or script tags.

  • Non-trivial integration at the application level of an office application should be avoided as much as possible. This will add additional cost and complexity, and limit your ability to deploy new tools or switch to another best of breed application (or for instance the use of mobile devices and tablets) later on. Move the information into the OpenDocument Format as late as possible, probably at one of the final stage where you require actual desktop office functionality. At that stage, ODF provides a flexible interface layer that should cover most if not all cases.

  • Try to start with clean ODF template documents that are technically sound and have been verified to work in the most common applications (see the chapter about validators and Officeshots), rather than constantly reconverting legacy templates with unknown results. Validate the output of every step in your process, so you can keep the entire processing pipeline clean. Is the manifest complete? Are all references styles defined?

  • If some legacy application cannot handle ODF, probably it is time to consider replacing it. Most of the tooling needed to work with ODF is very cheap (or free) compared to traditional solutions and office applications. Use an intermittent step such as XML, JSON or structured text output for the time being, and create your documents from that using one of the software libraries above. Starting with clean information with specialised modern tooling may actually allow you to create far better documents with likely no information loss at all.

  • If no alternative outputs are available from your vendor, you need to work with what you get and create them. Many office applications offer headless or command line modes that can be used for conversions. Conversion from legacy formats may not be efficient but should not be a problem unless your output is very unpredictable. Again validate your output and clean the results up, in order to not clutter any processing pipelines later on.

  • In your templates and document generation software, always use styles and headings rather than direct formatting. Never manually tweak the font type, font color, background or letter spacing manually – if you are conveying meaning through visual styling this will be lost for users of braille equipment. Add the appropriate accessibility information. This will significantly ease reuse, not just for your own sake, but also for those people that need to work with parts of your document later on. This includes people that need alternative reading mechanisms. By adding a style, your design becomes semantically meaningful.

Variable data printing

Some organisations do very large print runs which are basically a large mail merge running into the hundreds of thousands or millions of documents. This is known professionally as Variable Data Printing. Using full office documents or a PDF derivative is probably not the most efficient use of resources. Although printing system are really fast these days, big numbers change everything. The way you transfer the digital content into actual microscopic dots of different colors of ink on paper in the so called Raster Image Processor inside a printer can be done in ways that are sometimes more than an order of magnitude more efficient in both use of energy, time, network capacity and storage. You will not notice a time difference between waiting a hundredth of a second and a tenth of a second when you make a single print, but it does make a huge difference if you have to print consecutively for a couple of days or wait for months.

The actual printing process has two particularly expensive steps: color space conversion and rasterisation. Color space conversion transfers the colors as we use them on screen into ink mixtures, rasterisation calculates each individual dot to make the larger picture. Instead of computing the same data over and over again, one can use dedicated standards such as PPML by PODi that are created especially for high performance printing of large volumes of almost identical documents. Like OpenDocument Format, PPML (which stands for Personalised Print Markup Language) is an XML language, and so (parts of) documents created in ODF can be converted to PPML ready assets automatically using standard XML technologies such as XSLT.

This can make document pipelines to be very efficient, as it allows to reduce redundancy of information and allow to bring everything together in the latest stage of processing. Rather than expensive reprocessing of the same objects over and over, assets may be stored on a central server and pulled in on demand. This even includes transparent variable layers.