9th annual PCM in Europe.
Bart from Know.bi opens proceedings – talking about the history of PCM. 50+ at the hackathon last night, 200+ registrations.
Pedro Alves on Pentaho 7.0
Pentaho 7.0 is officially going to be released on Tuesday, but is now available on the portal and source forge.
The purpose of the release is to bring data engineering, data prep and analytics together.
Brings all of the steps together to speed up data to outcomes.
Alternative Big Data Devops – Tom Barber
Making life easier when deploying software and services for customers.
JUJU – by canonical, has charms (deployed software), relations… but it is not container tech.
Controller node to command different things.
Data modeling in Hadoop – Neson – Ubiquis
Showing how to do 2 different Kimball patterns in hadoop
– Slowly Changing Dimensions Type 2
– Accumulating Fact
* Event is added twice as it moves in and out of a status.
* Highly susceptible to order
* Only calculate age of event on “out” state (“in” state naturally has no age.).
Changing models to support write only operations.
Use of partitions
More details on status change here: Blogs on details
Visualisation API 3.0 – Duarte Leão – VizAPI 3.0
Needs to be usable in every part of the platform.
Needs to be printable
It is in preview release for 7.0 and expected to be released in 7.1 or 8.0.
Used in PDI and CDF
João Gameiro – CBF2 – Managing multi-project / multi-environment scenarios with Docker
Allows simple management of different versions. For more details see here: http://community.pentaho.com/ctools/cbf2/
* Core Image
* Core Containers
* Project Image
Jens Bleuel – What’s new in PDI 7.0
47 new steps support Metadata injections
Filter rows step now support MDI and it is done by XML injections (copy Filter rows step and paste into clipboard and you can get the XML you need to pass in).
Metadata Injection use cases:
– MDI Standard
– MDI Data Flow
– MDI 2 Phase processing
MDI Examples to be published to the wiki.
Data Services Hardening
Matt Casters – PDI Unit Testing
Unit Testing in PDI
Project on GitHub https://github.com/mattcasters/pentaho-pdi-dataset
Build based on PDI datasets (set up to simplify building ETLs that may not have actual data HDFS)
Going towards Test Driven development
– Start with a blank canvas
– Add Unit test before you build ETL
– Allows you to build a golden data set to validate your ETL against.
Easy to deploy from https://github.com/mattcasters/pentaho-pdi-dataset/releases/tag/0.3
Hiromu Hota – WebSpoon!
Web SPOON!!! Connect to repository and create and edit spoon transformations and jobs from the browser. Simplifies management and security!
This is an early release, but looks very very cool! I got it running on my 6.1 edition in about 5 minutes.
Here is the project on Github: WEBSPOON
Wael Elrifai – Data To Information, Information Knowledge
Good talk on machine learning.
Extra – know.bi & Neo4j – Loading data to Neo4j using PDI
Graph databases are referencing graph mathematics – not charts.
Graph models are a Hifi representation of real life.