Spreadsheets and formulae

For millions of users spreadsheets are just a modern version of the desktop calculator, which makes it easy to perform repetitive calculation tasks on blocks of numbers for immediate consumption on a screen. Many others use spreadsheet software primarily to hold and sort addresses, more as a sort of database. Spreadsheets can, of course, do a lot more.

Spreadsheet were originally created in the early days of the computer as a simple data entry and lightweight programming environment to allow regular users that lack programming knowledge to handle simple data manipulation tasks themselves. Over half a century later spreadsheets are used to process all kinds of (also non-numerical) tabular data as well as to create visualisations of that data. In some organisations this also include mission critical tasks such as finance or forecasting.

Spreadsheet software serves a number of different functions:

  • Convenient unstructured data entry for tabular data
  • A scratch pad where the user can choose which data to use, and some data can temporarily be put aside
  • Provide an implementation of a domain specific language (‘formulae’) describing a non-exhaustive set of operations to process that data
  • Bundled storage (combining data and processing instructions into a single portable file, unlike for instance a comma separated values file that is only able to hold data
  • A user interface to manually copy and paste a rectangular connected range of cells with rows and columns (a matrix) of original and computed data into documents

OpenFormula

Spreadsheets abstract away their underlying operations to the user by providing formulas. Each formula is a small software routine that can be called with a combination of:

  • fixed parameters, text strings and logic operators
  • data (ranges) from elsewhere within the spreadsheet or its metadata
  • (selection of data from) another source such as a database
  • (the result of the computation of other formulas)

OpenDocument Format 1.3 defines a standardised set of formulas that should be present in every compliant application, called OpenFormula. The fact that formulas are defined at the level of the document format has the benefit that they can also be used inside for instance a presentation or a text document. So when you cut and paste part of a spreadsheet into a document, it can retain the actual logic behind the cell values rather than just copy their context as text or image.

Before the publication of OpenFormula, every application was free to appropriate any formula name and assign any behaviour to it. Since spreadsheets are by design aimed at less technical users, typically short, descriptive formula names were used that unavoidably overlapped among products. Often these would be ambiguous enough to not have a consistent result between applications. Some differences are easy to remap programmatically, while in other cases it is not possible to fix without human intervention or a combination of special importers and compatibility modes. For every product, the translation would have to be made anew. For instance, Microsoft provided a “Wk4 file converter” together with a “Transition Formula Evaluation” mode to account for differences between the formula syntax of Lotus 1-2-3 and Excel.

Because the operations that the formulas describe are now well-defined in a proper standard, OpenFormula has finally made it possible to reliably share spreadsheet documents across any compliant product. Once software reads a formula, it is of course free to represent the formula internally in an arbitrary way (such as a bytecode sequence, compiled machine code, or a tree of nodes). Formulas are executed whenever they are needed to compute a specific result.

Note that while previous versions of ODF allowed to include formulas in office documents, their actual meaning was not previously defined. Depending on the vendor, differences may still occur when you use a version older than ODF 1.2. Notable issues arise with Microsoft Excel, which prior to ODF 1.2 used another internal definition of formulas. Exchanging spreadsheet documents with pre-ODF 1.2 implementations by non-experts is not recommended for daily usage apart from trivial use cases, as it could result in unclear and potentially wrong outcomes. With new spreadsheets created in ODF 1.2 or ODF 1.3 such historical issues are not relevant.

Levels of support

There are four “levels” within OpenFormula:

  1. Minimum implementation. This level provides the minimal capabilities (including functions, types, and their meanings) that are very widely implemented with spreadsheet applications, even in resource-contrained environments. It includes around 100 functions.
  2. Minimum desktop implementation. This level provides the minimal capabilities expected from typical desktop spreadsheets.
  3. Basic implementation. This level provides all the capabilities necessary for typical desktop spreadsheet use. This adds a distinguished logical type and support for complex numbers.
  4. Full implementation. This level provides some additional, less-commonly used features intended for advanced users.

Note that the lower two levels already provide all the functionality the vast majority of use cases and the vast majority of users require, and are “widely implemented” by desktop spreadsheet applications. At Level 3 an application supports all the functions supported by Microsoft Excel. Apart from the extended range of formulae, Level 3 and 4 have some provisions for backwards compliance with older application specific issues. Since not every application is likely or expected to support all levels, it might be better to properly convert the formula’s in your spreadsheets to not be depended on such backward compliance features.

Known pitfalls

Some of the differences between historical application defined formats are small but can still be a major hindrance.

A few are highlighted in the annotated version of the OpenFormula specification, which provides a lot of useful background information as well:

  • The precedence of mathematical operators in applications traditionally widely varied. In OpenFormula prefix “-” was given a higher precedence than “^”, to accommodate that “-2^2” remains “4” in Microsoft Excel and products that aimed to be compatible with it such as OpenOffice, LibreOffice and Gnumeric. Because prefix “-” had a lower precedence in Lotus 1-2-3, Quattro Pro, and Excel’s own Visual Basic, these products need to insert and remove parentheses when reading/writing expressions in OpenFormula where this matters.
  • In a formula, to distinguish between subsequent parameters, OpenFormula uses the semicolon (“;”). Many locales other than US English use the comma (“,”) as the decimal separator. Using the semicolon as the parameter separator eliminates confusion and the risk of incorrect implementation. Excel traditionally uses the “,” as the function parameter separator, which may cause incorrect usage.
  • In normal circumstances, OpenFormula does not allow empty parameters in formulas. By setting the application to behave according to level 3, that application can emulate the behaviour of Microsoft Excel to accept empty parameters in any position. Typical implementations will have many built-in functions, and most implementations also support one or more ways to create user-defined functions.
  • Microsoft Excel’s AND and OR functions historically did not work correctly in array formulas. This is a known bug. OpenFormula provides a correct definition of these functions. The traditional workarounds as recommended by some literature (using * and + inside array formulas as a work around) should however still work in ODF.
  • In Microsoft Excel’s display format, the space is used as the intersection operator. This can easily lead to mistakes or be confusing at the least. The OpenFormula exchange format uses “!” instead for intersection.

In many cases issues you may run into, are the result of inadequate mapping of historical product features to the OpenDocument Format standard. ODF 1.3 has been scrutinized by many different vendors and a large body of experts with different backgrounds, and is therefore unlikely to be the limiting factor here. Probably the software that you are using has stripped out some functionality during the conversion. For instance Excel spreadsheets converted into OpenDocument Format 1.1 using Microsoft Office may not display drop-down lists.

Some vendors provide an overview of their design choices when implementing ODF, these notes (such as the implementation of ODF in Microsoft Excel) can provide additional clues.

If you run into an issue:

  1. test a more recent software version to see if the issue has been addressed
  2. check if there are plugins or add-ins that might solve the problem
  3. ask your vendor why it doesn’t work. Inquire when the functionality can be implemented
  4. if it is not acceptable to wait, consider acquiring support to improve the product or build a plug-in or add-on
  5. alternatively, select a different desktop publishing tool that does meet your needs

Security and version management

There are good reasons to use spreadsheets, such as:

  • the low barrier of entry for users and
  • the fact that your data is very portable because it is packaged into a single document.

There are also good reasons to start phasing out the use of spreadsheets in quite a few cases, especially when robustness and scale come into play. Spreadsheets can be used as a hybrid of a data entry tool for a simple database and a custom functional back end, but being a mere ‘document’ on the desktop means that version management and access control (and thus security and reliability) typically are much weaker than is possible with dedicated solutions. While there are (often expensive) external solutions available to better manage these issues, and plugins or add-ins may help solve the security issues posed by macros, all of these add undesirable complexity and cost to your users. Your users might be better served using an online service or a simple integrated server based solution.

With the advent of collaborative browser based spreadsheets, another paradigm for access management is steadily taking hold. Of course the constraints with regards to security of user data on the internet as described in the chapter on collaboration hold for spreadsheets as well. Especially financially sensitive internal data such as sales predictions or revenue figures that could lead to manipulation of stock prices or have other undesired effects should be kept within a safe environment.

Accessibility

Spreadsheets predate the graphical user interface, and in many cases still have a strong text oriented workflow. This allows a great deal of flexibility for the user. OpenDocument Format leaves it up to the application and the operating system to determine how the user can interact with them.

There are some best practices that organisations can recommend to their users to make spreadsheets more accessible, for instance:

  • given that there are historical differences in the order of parsing mathematical operators, always use brackets to make clear what a formula looks like
  • Do net use empty parameters in formulas, even though applications can be set to parse legacy behaviour to allow it. This is confusing to blind users.
  • use ODF styles instead of assigning colours and fonts by hand, so a user with specific needs can switch to another style definition

There are many other recommendations, such as not using empty rows or columns to visually separate regions, as these are difficult to discover for blind users. The same effect can be used by changing the width or height. Consider deploying a tool that checks these automatically.

Please consult the chapter on accessibility for more detailed advice.