Easy ETL documentation

by Zachary Zeus
January 19, 2010
person holding pencil near laptop computer

Documentation of development is one of most important tasks in IT. We are paying special attention to create solutions that are robust and can be easily enhanced and adjusted by other developers – that is why Bizcubed is focusing on providing comprehensive documentation for all our solutions. As we are trying to increase efficiency of our development processes, we have created a framework that allows us to automate some documentation processes.

The starting point was the fact, that Pentaho is based on open standards and so all Pentaho ETL jobs and transformations as well as OLAP cubes are defined in xml format. In this article we refer to an approach to document ETL jobs and transformations. Exactly the same approach has also be used to document OLAP cubes.

Using the xslt language we can extract information from an xml file based on defined xml tags. If we open a Pentaho ETL job, for example, in a text editor, we can see its structure, which looks like this:


<?xml version=”1.0? encoding=”UTF-8??><br />
<job><br />
<name>temp</name><br />
<description/><br />
<extended_description/><br />
<job_version/><br />
<directory>&amp;#47;</directory><br />
<created_user>-</created_user><br />
<created_date>2010&amp;#47;01&amp;#47;19 10:00:14.581</created_date><br />
<modified_user>-</modified_user><br />
<modified_date>2010&amp;#47;01&amp;#47;19 10:00:14.581</modified_date></p>
<parameters>
</parameters>
<slaveservers><br />
</slaveservers><br />
<logconnection/><br />
<logtable/><br />
<size_limit_lines/><br />
<use_batchid>Y</use_batchid></p>
<pass_batchid>N</pass_batchid>
<use_logfield>Y</use_logfield><br />
<shared_objects_file/><br />
<entries><br />
</entries><br />
<hops><br />
</hops><br />
<notepads><br />
</notepads><br />
</job>


The steps of the job are defined as “entries” and the connections between the steps are the “hops”.

The next step is to create an xslt file that would extract the information from this job based on the xml and present that information in a user friendly way, for example as a web site.

Some sample xslt files and instructions to implementation can be found on our wiki.


Portrait of Maxx Silver
Zachary Zeus

Zachary Zeus is the Co-CEO & Founder of BizCubed. He provides the business with more than 20 years' engineering experience and a solid background in providing large financial services with data capability. He maintains a passion for providing engineering solutions to real world problems, lending his considerable experience to enabling people to make better data driven decisions.

More blog posts