Pentaho Enterprise Edition: Extract the BA schedule into a database

by Andrew Cave

September 3, 2018

How To, Main Blog, News, Pentaho, Resources, Training, Uncategorized

Pentaho Enterprise and Community Editions offer the Scheduling tool to enable jobs to be set to run at certain times on a repeatable basis. At the moment, while the schedule is viewable, there is no built-in method for extracting the schedules for reporting or documentation purposes. A couple of years back I figured out the methods to achieve this and I’d like to share the ETL with the wider Pentaho community.

This ETL uses the Pentaho REST API to get the XML containing the scheduler information, including the emails, the output types, the messages and which reports/jobs/transformations are being used.

screen shot of the KTR

In section 1, we set the URL, password and HTTP Accept type header (application/xml) and return the xml in a single string variable.

In section 2, the result XML is sent to XML readers which will split it to gain different contexts of the data

a. ‘Get job header’ fetches the timing information
b. ‘get DayOfWeek Recurrences’ picks out the scheduling recurrences for days of the week (e.g. when set to run Mon, Wed, Fri)
c. ‘job data detail’ extracts the information stored in ‘key/value’ pairs

Section 3 is where ‘key/value’ pairs which are in an indefinite number of rows, are ‘pivoted’ so that the key is now the field name and the ‘value’ is the…value.

After those sections, the rows from each line are joined using the jobname, UNIX timestamps get converted to database timestamps and the data is prepared before finally being inserted to the table pentaho_schedule

Here’s a zip of the whole ktr. It has explanatory notes and the (postgres) DDL for creating the pentaho_schedule table. Converting that to another database should be easy.
Don’t forget to change the database connection to suit your setup.

read_schedule_ba

Andrew Cave

Andrew Cave is a senior data engineer with BizCubed. He has worked in network data, billing, telco credit and debt after a career in the welfare sector. He loves databases. Follow him on LinkedIn

Pentaho Enterprise Edition: Extract the BA schedule into a database

Andrew Cave

More blog posts

If Your Data Can’t Keep Up, Neither Can Your AI

Mandatory climate reporting is no longer on the horizon, it’s here

BizCubed confirms commitment to security with updated ISO27001 certification

Follow us