PCM 16

by Zachary Zeus
November 14, 2016

9th annual PCM in Europe.

Bart from Know.bi opens proceedings – talking about the history of PCM.  50+ at the hackathon last night, 200+ registrations.

Pedro Alves on Pentaho 7.0

  • Pentaho 7.0 is officially going to be released on Tuesday, but is now available on the portal and source forge.
  • The purpose of the release is to bring data engineering, data prep and analytics together.
  • Brings all of the steps together to speed up data to outcomes.

Alternative Big Data Devops – Tom Barber

  • Making life easier when deploying software and services for customers.
  • JUJU – by canonical, has charms (deployed software), relations… but it is not container tech.
  • Controller node to command different things.

Data modeling in Hadoop – Neson – Ubiquis

  • Showing how to do 2 different Kimball patterns in hadoop
    • Slowly Changing Dimensions Type 2
    • Accumulating Fact
  • Event is added twice as it moves in and out of a status
  • Highly susceptible to order
  • Only calculate age of event on “out” state (“in” state naturally has no age.).\
  • Changing models to support write only operations
  • Use of partitions
  • More details on status change here:  Blogs on details

Visualisation API 3.0 – Duarte Leão – VizAPI 3.0

  • Needs to be usable in every part of the platform.
  • Needs to be printable
  • It is in preview release for 7.0 and expected to be released in 7.1 or 8.0.
  • Used in PDI and CDF

João Gameiro – CBF2 – Managing multi-project / multi-environment scenarios with Docker

  • Allows simple management of different versions.  For more details see here: http://community.pentaho.com/ctools/cbf2/
  • Requires:
    • Core Image
    • Core Containers
    • Projects
    • Project Image

Jens Bleuel – What’s new in PDI 7.0

  • 47 new steps support Metadata injections
  • Filter rows step now support MDI and it is done by XML injections (copy Filter rows step  and paste into clipboard and you can get the XML you need to pass in).
  • Metadata Injection use cases:
    • MDI Standard
    • MDI Data Flow
    • MDI 2 Phase processing
  • MDI Examples to be published to the wiki
  • Data Services Hardening
  • Lots more!!!

Matt Casters – PDI Unit Testing

  • Unit Testing in PDI
  • Project on GitHub https://github.com/mattcasters/pentaho-pdi-dataset
  • Build based on PDI datasets (set up to simplify building ETLs that may not have actual data HDFS)
  • Going towards Test Driven development
    • Start with a blank canvas
    • Add Unit test before you build ETL
    • Allows you to build a golden data set to validate your ETL against.
  • Easy to deploy from https://github.com/mattcasters/pentaho-pdi-dataset/releases/tag/0.3

Hiromu Hota – WebSpoon!

  • Web SPOON!!!  Connect to repository and create and edit spoon transformations and jobs from the browser.  Simplifies management and security!
  • This is an early release, but looks very very cool!  I got it running on my 6.1 edition in about 5 minutes.
  • Here is the project on Github:  WEBSPOON

ReelMetrics –

  • Pentaho is embedded w Rails.  Unified login – seamless integration.

Wael Elrifai – Data To Information, Information Knowledge 

  • Hitachi Rail – 150 trains – 3000 sensors – output data every 5 seconds.
    • Usage based pricing
    • Minimise downtime using PdM.
    • Metadata injection.
  • Good talk on machine learning
  • Supervised Learning + Unsupervised Learning = Deep Discovery
  • Reference Architecture

Extra – know.bi & Neo4j – Loading data to Neo4j using PDI

  • Graph databases are referencing graph mathematics – not charts.
  • Graph models are a Hifi representation of real life.
Portrait of Maxx Silver
Zachary Zeus

Zachary Zeus is the Co-CEO & Founder of BizCubed. He provides the business with more than 20 years' engineering experience and a solid background in providing large financial services with data capability. He maintains a passion for providing engineering solutions to real world problems, lending his considerable experience to enabling people to make better data driven decisions.

More blog posts