Easy ETL documentation

by Zachary Zeus
January 19, 2010

Documentation of development is one of most important tasks in IT. We are paying special attention to create solutions that are robust and can be easily enhanced and adjusted by other developers – that is why Bizcubed is focusing on providing comprehensive documentation for all our solutions. As we are trying to increase efficiency of our development processes, we have created a framework that allows us to automate some documentation processes.

The starting point was the fact, that Pentaho is based on open standards and so all Pentaho ETL jobs and transformations as well as OLAP cubes are defined in xml format. In this article we refer to an approach to document ETL jobs and transformations. Exactly the same approach has also be used to document OLAP cubes.

Using the xslt language we can extract information from an xml file based on defined xml tags. If we open a Pentaho ETL job, for example, in a text editor, we can see its structure, which looks like this:

<?xml version=”1.0? encoding=”UTF-8??><br />
<job><br />
<name>temp</name><br />
<description/><br />
<extended_description/><br />
<job_version/><br />
<directory>&amp;#47;</directory><br />
<created_user>-</created_user><br />
<created_date>2010&amp;#47;01&amp;#47;19 10:00:14.581</created_date><br />
<modified_user>-</modified_user><br />
<modified_date>2010&amp;#47;01&amp;#47;19 10:00:14.581</modified_date></p>
<slaveservers><br />
</slaveservers><br />
<logconnection/><br />
<logtable/><br />
<size_limit_lines/><br />
<use_logfield>Y</use_logfield><br />
<shared_objects_file/><br />
<entries><br />
</entries><br />
<hops><br />
</hops><br />
<notepads><br />
</notepads><br />

The steps of the job are defined as “entries” and the connections between the steps are the “hops”.

The next step is to create an xslt file that would extract the information from this job based on the xml and present that information in a user friendly way, for example as a web site.

Some sample xslt files and instructions to implementation can be found on our wiki.

More blog posts

Information Security and the BizCubed Ways and Values

We’ve blogged previously about the BizCubed Ways and Values.  Many companies have versions of this – they may call them values, guiding principles or another phrase.  While many in large organisations struggle to see them as more than “corporate buzzwords”, we rely on them heavily and incorporate them daily.

read more

BizCubed’s Journey to ISO27001 Certification

A year ago, we blogged about Data Security as a Practice. Since then, we have continued the practices we talked about there, integrating it even further into our daily practice, and incorporating new processes and aspects into our existing cadence.

read more