Reliable Insight – Moving beyond Excel for Operational Analysis

by Zachary Zeus
July 14, 2014

Approximately 20-40% of all spreadsheets contain errors(1), and there’s no easy way to estimate how significant those errors are, and how much it’s affecting your ability to make decisions.ETL Architecture

Microsoft Excel is an indispensable tool for the financial analyst ever since it was first introduced in the late 80s.  However, the amazing flexibility of the spreadsheet introduces problems of quality control; they provide little help to prevent errors, and still less help in tracking errors down.

Spreadsheets are not always the best tool for the job. One of the reasons they’re used so widely may be related to the Law of the Instrument: “when you’re holding a hammer, every problem looks like a nail”.  Another key reason it’s so heavily used is that it is widely available.

Analysing Sales Data

Consider a scenario when an analyst prepares a sales analysis report for the management team on a monthly basis, using raw sales data from the ERP system.  In this scenario, the analyst can’t get data  from the ERP system directly, so she has to ask a member of the IT team run an SQL query for her each month.   

The raw transactional data from the ERP systems often need processing with vlookups, filters, etc before they can be used in the required analysis, and each of these processing steps can introduce errors.  

What Errors?

The most common errors in spreadsheets fall into the following categories.(2)

•   Logic Error – a formula is used incorrectly, leading to an incorrect result.

•   Reference Error – a formula contains one or more incorrect references to other cells.

•   Placing Numbers in a Formula – one or more numbers appear in formulas.

•   Copy/Paste Error -a formula is wrong due to inaccurate use of copy/paste.

•   Data Input Error- an incorrect data input is used.

•   Omission Error – a formula is wrong because one or more of its input cells is blank.

Other common errors arise because:

•   Spreadsheets are easy to change, even accidentally, with little indication what has changed over time.

•   When changes are made in spreadsheets, theres no easy way to know who changed the spreadsheet, and the nature of the change.

•   Errors in data entry flow through to outputs, often without detection

•   Complex formula often combine multiple transformations in one process, making it difficult to detect problems with a single transformation or formula.

•   Importing data from external systems typically involves manual tasks, running the query or report and importing it into the spreadsheet.  Each manual step introduces the possibility of human error.

•   Systems are not documented, so not easily transferred between staff when

•   Run on desktops

Data Integration Systems – ETL

Data Integration Systems (like Pentaho Data Integrator – PDI) are used to extract data from one place conduct processing on that data, and putting the result somewhere.   Collectively, this category is known as ETL (Extract, Transform and Load).

ETL systems provide easy to use graphical tools designed to be used by non-technical people, and provide a more robust system for processing data than the typical spreadsheet.

ETL systems have the following advantages over spreadsheets.

•   Repeatable – ETL processes can be scheduled to run on a predetermined scheduled, with no manual intervention.  Because they’re run on a server, they’re not reliant on an person’s desktop system to work.

•   Traceable – ETL systems typically do the processing of data in a series of defined steps (see the diagram above).  When problems are suspected or questions arise in the analysis, every step of the data process can be traced and examined.  This level of investigation is often impossible to do in spreadsheets when all the calculations are conducted in one go.

•   Automated – When an ETL process is developed, it will operate with the manual involvement of the designer.  The organisational process is no longer is reliant on the individual expertise and availability of an individual person, but instead becomes part of the organisational systems and knowledge.

•   Maintainable – An ETL system is largely self-documenting, with the inputs, transforms and outputs all included in the definitions. This reduces the risk that the a process is reliant on undocumented processes completed by the developer of the spreadsheet.

•   Reliable – The ETL approach reduces the opportunity for errors and inadvertent chances over time, and so the results will can be accepted with more confidence than their spreadsheets equivalents.

(1) http://panko.shidler.hawaii.edu/SSR/Mypapers/whatknow.htm

(2) http://mba.tuck.dartmouth.edu/spreadsheet/product_pubs_files/Errors.pdf

(3) http://robertkugel.ventanaresearch.com/2013/01/29/the-spreadsheet-and-the-whale/

We have a free data integration kit that demonstrates how easy it is to automatically extract data from Salesforce.com on a regular basis and use it for analysis and reporting.

Download our Saleforce Integration kit.

Download the Salesforce Integration Kit

More blog posts

Building a New Habit is HARD!

 “Building business logic is a low priority, from my perspective.” This was what my CEO said to me as I shared with him my priorities for the week. The funny part? I work for a data engineering firm...

read more