Easy ETL documentation

by Zachary Zeus
January 19, 2010

Documentation of development is one of most important tasks in IT. We are paying special attention to create solutions that are robust and can be easily enhanced and adjusted by other developers – that is why Bizcubed is focusing on providing comprehensive documentation for all our solutions. As we are trying to increase efficiency of our development processes, we have created a framework that allows us to automate some documentation processes.

The starting point was the fact, that Pentaho is based on open standards and so all Pentaho ETL jobs and transformations as well as OLAP cubes are defined in xml format. In this article we refer to an approach to document ETL jobs and transformations. Exactly the same approach has also be used to document OLAP cubes.

Using the xslt language we can extract information from an xml file based on defined xml tags. If we open a Pentaho ETL job, for example, in a text editor, we can see its structure, which looks like this:

<?xml version=”1.0? encoding=”UTF-8??><br />
<job><br />
<name>temp</name><br />
<description/><br />
<extended_description/><br />
<job_version/><br />
<directory>&amp;#47;</directory><br />
<created_user>-</created_user><br />
<created_date>2010&amp;#47;01&amp;#47;19 10:00:14.581</created_date><br />
<modified_user>-</modified_user><br />
<modified_date>2010&amp;#47;01&amp;#47;19 10:00:14.581</modified_date></p>
<slaveservers><br />
</slaveservers><br />
<logconnection/><br />
<logtable/><br />
<size_limit_lines/><br />
<use_logfield>Y</use_logfield><br />
<shared_objects_file/><br />
<entries><br />
</entries><br />
<hops><br />
</hops><br />
<notepads><br />
</notepads><br />

The steps of the job are defined as “entries” and the connections between the steps are the “hops”.

The next step is to create an xslt file that would extract the information from this job based on the xml and present that information in a user friendly way, for example as a web site.

Some sample xslt files and instructions to implementation can be found on our wiki.

More blog posts

Working at BizCubed: Teresa Cheung

This month, we are talking about diversity at BizCubed and what it means to the people who work here. Here is Teresa Cheung, one of our Junior Data Engineers, with her thoughts on the topic. Teresa...

read more