Build a Streamlined Data Refinery

by Zachary Zeus
October 29, 2014
With the surge of data over the last few years, it has been a complex task for many businesses to get real value from Big Data. 

sdr-architecture-diagram-630pxwidthSimple batch reporting isn’t up to scratch anymore – consumers want easy to understand visual analytics, in their favourite on-demand real-time format, collaborating with their existing software. 

This puts strain on the IT department, while slowing business users down, which led to the cool visualisation tools to help themselves- although demands are only partially met. While they can view a subset a subset of data, trusting the data and getting approval from IT is a set of challenges they’ve had to face – until now.

In Pentaho’s latest 5.2 release, the innovative Streamlined Data 

Refinery (SDR) is a flexible, economical way to process and automate delivery of information to a large numbers of users for many analytic purposes. It sets a new standard of data delivery by streamlining the process, empowering business users. The design pattern accommodates an on-demand process from user-initiated data requests, blending and refining the data, automatic analysis schema generation, and the ability to publish analytic data sets in any format.

Innovative features

  • JDBC drivers are simpler to install
  • New data source locations have improved visibility
  • The PDI depository’s performance has improved
  • Simplified R script integration in data science pack
  • Enhanced documentation and samples for embedded analytics

Pentaho Data Integration

Pentaho’s highly scalable data integration engine, managed through its intuitive end user interface, provides the glue between the various data sources and stores in this architecture. This process can be actioned on-demand using PDI:

Blending & Orchestration: PDI absorbs data from any data source and then processes, cleanses and blends the data to drive insight.

Automatic Modelling & Publishing: PDI, as part of the data orchestration process, creates an OLAP schema and publishes it to the Pentaho Business Analytics server for end user visualisation.

Governance: IT can promptly validate data sources blended at the source, allowing for the right measure of control. Governed Data Delivery is the delivery of blended, trusted and timely data to power analytics, regardless of positions.

How does the SDR solution compare to our competitors?

Pentaho’s SDR solution is unique to anything else on the market as it’s a complete solution. With the combination of data integration, orchestration, Big Data connectivity and governed data delivery through an open web-enabled platform, the SDR is a differentiator. For example:

  • Informatica and Tableau: Informatica and Tableau visualisations, but together they can’t deliver on-demand data blended across many source upon user request in a web UI.
  • Alteryx: Delivers lightweight data integration in an analyst drag/drop UI, but cannot provide for robust Big Data transformation and orchestration or high-performance embeddable visual analytics.

More blog posts

Building a New Habit is HARD!

 “Building business logic is a low priority, from my perspective.” This was what my CEO said to me as I shared with him my priorities for the week. The funny part? I work for a data engineering firm...

read more