PCM 16

by Zachary Zeus
November 14, 2016

9th annual PCM in Europe.

Bart from Know.bi opens proceedings – talking about the history of PCM.  50+ at the hackathon last night, 200+ registrations.

Pedro Alves on Pentaho 7.0

Pentaho 7.0 is officially going to be released on Tuesday, but is now available on the portal and source forge.

The purpose of the release is to bring data engineering, data prep and analytics together.

Brings all of the steps together to speed up data to outcomes.

Alternative Big Data Devops – Tom Barber

Making life easier when deploying software and services for customers.

JUJU – by canonical, has charms (deployed software), relations… but it is not container tech.

Controller node to command different things.

Data modeling in Hadoop – Neson – Ubiquis

Showing how to do 2 different Kimball patterns in hadoop

– Slowly Changing Dimensions Type 2

– Accumulating Fact

*  Event is added twice as it moves in and out of a status.

*  Highly susceptible to order

*  Only calculate age of event on “out” state (“in” state naturally has no age.).

Changing models to support write only operations.

Use of partitions

More details on status change here:  Blogs on details

Visualisation API 3.0 – Duarte Leão – VizAPI 3.0

Needs to be usable in every part of the platform.

Needs to be printable

It is in preview release for 7.0 and expected to be released in 7.1 or 8.0.

Used in PDI and CDF

 João Gameiro – CBF2 – Managing multi-project / multi-environment scenarios with Docker

Allows simple management of different versions.  For more details see here: http://community.pentaho.com/ctools/cbf2/

– Requires

* Core Image

* Core Containers

* Projects

* Project Image

Jens Bleuel – What’s new in PDI 7.0

47 new steps support Metadata injections

Filter rows step now support MDI and it is done by XML injections (copy Filter rows step  and paste into clipboard and you can get the XML you need to pass in).

Metadata Injection use cases:

– MDI Standard

– MDI Data Flow

– MDI 2 Phase processing

MDI Examples to be published to the wiki.

Data Services Hardening

Lots more!!!

Matt Casters – PDI Unit Testing

Unit Testing in PDI

Project on GitHub https://github.com/mattcasters/pentaho-pdi-dataset

Build based on PDI datasets (set up to simplify building ETLs that may not have actual data HDFS)

Going towards Test Driven development

–  Start with a blank canvas

–  Add Unit test before you build ETL

–  Allows you to build a golden data set to validate your ETL against.

Easy to deploy from https://github.com/mattcasters/pentaho-pdi-dataset/releases/tag/0.3

Hiromu Hota – WebSpoon!

Web SPOON!!!  Connect to repository and create and edit spoon transformations and jobs from the browser.  Simplifies management and security!

This is an early release, but looks very very cool!  I got it running on my 6.1 edition in about 5 minutes.

Here is the project on Github:  WEBSPOON

Screen-Shot-2016-11-13-at-12.58.48-AM.png

ReelMetrics –

Pentaho is embedded w Rails.  Unified login – seamless integration.

Wael Elrifai – Data To Information, Information Knowledge 

Hitachi Rail – 150 trains – 3000 sensors – output data every 5 seconds.
Usage based pricing
Minimise downtime using PdM.
Metadata injection.

Good talk on machine learning.

Supervised Learning + Unsupervised Learning = Deep Discovery
Reference Architecture

Extra – know.bi & Neo4j – Loading data to Neo4j using PDI

Graph databases are referencing graph mathematics – not charts.

Graph models are a Hifi representation of real life.

More blog posts