Independance from a single point of failure

by Zachary Zeus
July 28, 2014
lifebuoy, swimming ring, save

Today I had to pick up from a developer who had left my team 3 months ago.

This can be a challenge for any group or organisation, but it’s particularly challenging when the development uses new or experimental technology.

In my case, I wanted to work on a Hadoop and MongoDB example my team had developed. I was relatively familiar with where the information lived, so I logged in, found the elements and was able to work through the entire example in less then an hour.

This was in part because my team did a good job documenting what they had done, but it was also due to the fact that development had been done in Pentaho Data Integration so I could logically follow the steps that had been implemented.  Looking below you can see the process that had been implemented.




What this process does is written directly into the ETL process.

“To take our web logs and determine the referring traffic to our sites.

Web logs are copied from our web proxy, nGinX, to a staging area on the DI Server and then transferred to our Hadoop Cluster.  Once on the cluster we use Pentaho Visual MapReduce to parse the log file and return the total hits and traffic for each country and referring site per day.”

This provides me the ability to, within an hour, make modifications to the process, debug issues or extend it for other usage.

This relatively “simple” process existed over 3 different servers/services and included a Pentaho DI server, our Firewall and our Hadoop cluster.  I was able to log into one place and see exactly how our data flow went.

Having a single place where all of my data management processes are maintained means that my team and I can focus on doing analysis and making improvement rather then turning the handle of data processing or worse, reinventing the wheel.

Portrait of Maxx Silver
Zachary Zeus

Zachary Zeus is the Co-CEO & Founder of BizCubed. He provides the business with more than 20 years' engineering experience and a solid background in providing large financial services with data capability. He maintains a passion for providing engineering solutions to real world problems, lending his considerable experience to enabling people to make better data driven decisions.

More blog posts