The Pentaho 8.1 Enterprise Edition was released a few months ago. It delivers a wide range of features and improvements, including; Improved Streaming Steps in PDI, Increased Spark Capabilities in PDI, Enhancements to Google Cloud Data, Increased AWS Security. Significant forward momentum in Big Data Steps, Improvements in Data Integration and in Business Analytics.
I have done an analysis of the 8.1 release and here is my take.
This release continues Hitachi’s commitment to interoperability in complex ecosystems. In my analysis of the release I have identified a couple of key themes:
- Doubling down on cloud
- Improving the core data integration engine, and
- Expanding on some key analytics use cases
Doubling Down on Cloud:
Google Cloud
Pentaho has significantly expanded its support for the Google Cloud Platform (GCP). Pentaho 8.1 gives you the ability to impeccably connect to the Google Cloud Storage leveraging a VFS browser for importing/exporting data to and from Google Drive. I remember well, a few years ago, one of BizCubed’s customers implemented a plugin for Google BigQuery. Now, with the addition of the Google BigQuery Loader job entry, BigQuery is now a data source within the Pentaho User Console or in the PDI client. You can now set up JDBC connections and create ETL pipelines to access and store data with Google Cloud big data services. It is great to see Pentaho integrating more holistically with GCP. Leveraging a VFS browser for importing and exporting data to and from Google Drive is a good step forward. The integration with Google Drive and the ability to access analytics on BiqQuery too are robust steps forward.
Amazon Web Services
Pentaho has also improved its play in the AWS sphere. Pentaho Data Integration can assume IAM role permissions to provide secure read/write access to AWS’ S3 Web Service. The need to provide hardcoded credentials at every step is now gone. With a significantly reduced credential management burden, there is added flexibility, which accommodates different AWS security scenarios to provide a better user experience while reducing the security risk. The revised S3 CSV Input and Output transformation steps enable PDI to extract data from AWS with the necessary security enhancements. These steps seamless delivery of IAM security keys from environment variables, from your machine’s home directory, or from EC2 instance profile. Pentaho now also has added Adaptive Execution Layer (AEL) support for Amazon EMR.
Improving the Core Pentaho Data Integration (PDI) Engine:
Pentaho has improved the streaming capabilities. Retrospectively, Pentaho had a streaming engine, but the tooling had been built around a batch processing mindset. PDI now has been enhanced to a core streaming engine. PDI now has two new streaming data-sources (MQTT Input & Output AND JMS Input & Output). There is even a Safe Stop for streaming processes. Essentially, you can now safely stop streaming transformations without loss of records. This safe stop is available in batch transformation within Spoon, Carte, and the Abort step. There is better workflow handling for streaming data sets and streaming data services. The Transformation Executor step can be used to run a sub-transformation with Spark on AEL. More Big Data formats are now supported natively. Optimized Record Columnal (ORC) Input and Output transformation steps have been added to enable PDI to perform the columnar data serialization method with indexing to ease the development of pipelines that handle these formats. Native handling of ORC files through input and output steps is available from any standard storage system and is also accessible through Virtual File System (VFS) drivers. To improve performance, native execution of the steps can occur in the Pentaho engine or in Spark using AEL.
You now have access to enhanced worker nodes via the Hitachi Vantara Foundry project. Features that have been added, include: improvements to monitoring, with accurate propagation of Work Items status for monitoring, performance improvements by optimizing the startup times for executing the work items, customizations are now externalized from docker build process, job clean up functionality etc.
Expansion of Some Key Analytics Use Cases:
Pentaho with v8.1 offers improvements to its Business Analytics capability. There is Continuous Axis for Time Dimensions in Visualizations. Line, Area, and Chart visualizations now use a continuous display of data for the Time Dimension. The data points are now proportional to the time duration for a more visually accurate representation of data trends. Previously, the time axis used discrete data points equally spaced. The real-time streaming support for dashboards and improved Time series visualization are much-needed features. Furthermore, there is improved data exploration in the PDI tool itself. I know for a fact that many of BizCubed’s clients will be excited about the repository browser improvements.
So, there you have it folks. There is my take on the Pentaho 8.1 release. Should you wish to discuss more about Pentaho or how BizCubed can help you make better decisions using the Pentaho (and other) tool, reach out to us.
Zachary Zeus
Zachary Zeus is the Co-CEO & Founder of BizCubed. He provides the business with more than 20 years' engineering experience and a solid background in providing large financial services with data capability. He maintains a passion for providing engineering solutions to real world problems, lending his considerable experience to enabling people to make better data driven decisions.