Data: Let’s think in more refined way about this messy subject

by Zachary Zeus
December 2, 2014
factories with smoke under cloudy sky

ISO 55000 defines Asset management as the “coordinated activity of an organization to realize value from assets”. WikipediaIn a recent commentary by David Norris of Bloor Research, he explored the subject of data asset management, it’s critical nature and the lack of good tooling available to help at the moment.

One of the most valuable assets in every organisation is data. We see it sweep past us daily, some we gather, ponder or analyse, some we back up, some we just discard.

David Bloor explores the essential nature of the often short ‘window of enlightenment’ and how, without proper data refinement (including big data refinement), organisations can fail or succeed.

Business / IT Data Disconnect

Mr Bloor explores the disconnect between business and IT, in terms on the dramatic tension between IT’s desire to protect and nurture data, ensure security, manage risk and compliance and the businesses desire to see what’s happening as soon as possible with as many different dimensions on the problem as possible.

“The business wants to exploit the data, to integrate and blend all of the data it can find, to do all of those things without needing to worry about compatibility, data lineage, and things that, to them, should be taken care of and just happen.” David Norris, Bloor ResearchThis tension often ends up being resolved by half-baked solutions in spreadsheets, analytics ‘bling’ tools and other reporting level solutions that are (in Mr Bloor’s words) “disparate sand boxes”.

Essentially, it is IT’s role to make all this work but it’s little wonder that it’s not happening well to date.

IT are being hit with:

    1. A huge surge in data sources including Big Data, APIs to cloud providers, mobile
    2. A huge surge in data structure technologies like Columnar databases (Vertica, MonetDB, Netezza), Big Data (hadoop, cloudera, MapR, Hortonworks and others), Big Data technologies, Hive, Pig, Hbase, Spark, Yarn, Graph Databases like Neo4J and NoSQL databases coming out every month or two: MongoDB, LucidDB and many others.
      • An increasing amplitude of software releases capturing more data in different structures for each release. It’s the norm now to see several releases a year from upstream software providers.
      • More stringent Regulatory requirements around data capture and dissemination.
      • As IT strive to move through these changes as rapidly as they can, the business is left to it’s own devices and takes matters into their own hands.

So Data Asset Management is a huge and growing problem in the industry, one company, Pentaho,  have a roadmap to build the tools to take control back. Pentaho are calling this the Streamlined Data Refinery.

From the trenches

Some of Pentaho’s experiences in the Streamlined Data Refinery (SDR) come from the field where organisations are already using the tool to provide interfaces to the sorts of datasets the business requires.

Source: http://www.pentaho.com/Streamlined-Data-Refinery

Pentaho is handling the process of providing a business friendly interface for querying the Petabytes of data available.

The queries then resolve across Cloudera or other Big Data sources and return potentially hundreds of millions of rows of information neatly bundled in Analytics Cubes.

These are delivered to business people, ready for rapid splicing and dicing (in one use-case, Vertica is the backbone behind the cubes).

A process that used to take several months with Java engineers and project managers now takes less than 24 hours.

Responsive to change

Since Pentaho are data-source agnostic, they can use their SDR methodology with pretty much any of the various Big Data, NoSQL, Analytics Warehouse, RDMS technologies out there.

This provides a greatly needed abstraction from the ever growing data sources as well as a more clearly defined path to getting control of your data assets and putting them under proper Data Asset Management.

Build your Oil Data Refinery

Data, as it is said, is the new Oil. Perhaps it’s time to take care of this valuable asset, refine it correctly and deliver it up for consumption to the people who know how to act on it.

Pentaho already have a number of the tools to complete the Streamlined Data Refinery in place. If you are interested in exploring better Data Asset or interested in the roadmap that’s getting the industry there, please get in touch.
Contact us now

Portrait of Maxx Silver
Zachary Zeus

Zachary Zeus is the Co-CEO & Founder of BizCubed. He provides the business with more than 20 years' engineering experience and a solid background in providing large financial services with data capability. He maintains a passion for providing engineering solutions to real world problems, lending his considerable experience to enabling people to make better data driven decisions.

More blog posts