Pentaho Enterprise Edition: Extract the BA schedule into a database

by Andrew Cave
September 3, 2018
Image of Pentaho logo on blue sparkling background

Pentaho Enterprise and Community Editions offer the Scheduling tool to enable jobs to be set to run at certain times on a repeatable basis. At the moment, while the schedule is viewable, there is no built-in method for extracting the schedules for reporting or documentation purposes. A couple of years back I figured out the methods to achieve this and I’d like to share the ETL with the wider Pentaho community.

This ETL uses the Pentaho REST API to get the XML containing the scheduler information, including the emails, the output types, the messages and which reports/jobs/transformations are being used.

KTR Screen Shot

screen shot of the KTR

In section 1, we set the URL, password and HTTP Accept type header (application/xml) and return the xml in a single string variable.

In section 2, the result XML is sent to XML readers which will split it to gain different contexts of the data

a. ‘Get job header’ fetches the timing information
b. ‘get DayOfWeek Recurrences’ picks out the scheduling recurrences for days of the week (e.g. when set to run Mon, Wed, Fri)
c. ‘job data detail’ extracts the information stored in ‘key/value’ pairs

Section 3 is where ‘key/value’ pairs which are in an indefinite number of rows, are ‘pivoted’ so that the key is now the field name and the ‘value’ is the…value.

After those sections, the rows from each line are joined using the jobname, UNIX timestamps get converted to database timestamps and the data is prepared before finally being inserted to the table pentaho_schedule

Here’s a zip of the whole ktr. It has explanatory notes and the (postgres) DDL for creating the pentaho_schedule table. Converting that to another database should be easy.
Don’t forget to change the database connection to suit your setup.

read_schedule_ba

Portrait of Maxx Silver
Andrew Cave

Andrew Cave is a senior data engineer with BizCubed. He has worked in network data, billing, telco credit and debt after a career in the welfare sector. He loves databases. Follow him on LinkedIn

More blog posts

ESG Reporting Nirvana Now Within Reach

BizCubed launches ESG123 – An advanced data engineering services suite designed to streamline reporting compliance and enable sustainability teams to turn ESG into a strategic advantage. BizCubed, a...

read more