With the Pentaho 5.1 release there was a really exciting innovation – analytics directly against a Mongo DB cluster. Mongo on Pentaho is exciting because it means that you can do advanced data discovery against Mongo collections directly.
Better Together
Setting this up can be a bit tricky – as you have to understand how both Pentaho and Mongo and work. Additionally, Pentaho has introduced some new technologies in order to make this capability robust, scalable and extendible out to other data sources.
New Pentaho Technologies to make this work:
– Mondrian 4.0 – this is the next generation OLAP engine that is part of the Pentaho suite. This is the first time that it has made it into a Pentaho production release. It also has a new schema syntax.
– OSGI – This allows the Mondrian 4.0 engine to be implemented along side the existing mondrian implementations. This also opens up more options in the platforms plug able layer.
– PentahoMongoOlap Layer. This is a mapping from the Olap Engine and it’s syntax (expressed on the front-end in MDX) to the Mongo functions and query syntax. Historically, this mapping has only been done to SQL – now we have an implementation of two syntaxes (SQL and Mongo) – this opens the door to do OLAP analyses across any syntax.
See this article by Will Gorman, on the technical details of all of this and to get the Mondrian 4 package installed if you’re using the Pentaho CE version.
– Mondrian 4, OSGi in Pentaho 5.1 CE
Assumptions
Make sure you have these in place before you get started.
1. Running Pentaho Business Analytics server with Mondrian 4 installed and functioning correctly
2. Running MongoDB database v 2.6 or above
Resources:
1. Will Gormans blog post (helps getting Mondrian 4 running)
– Mondrian 4, OSGi in Pentaho 5.1 CE
2. This JIRA task – (preliminary documentation on configuring Mongo on Pentaho)
– http://jira.pentaho.com/browse/MONDRIAN-1902
Steps (taken straight from the doc):
– Import sample data
– Upload Mondrian 4.0 Schema file
– Define the data connection
Import Sample Data
– If you have Pentaho 5.1 EE the sample data is located in /pentaho-solutons/system/samples/mondrian-data-foodmart-json-0.3.3.zip
– Unzip “mondrian-data-foodmart-json-0.3.3.zip”
– This gives you a folder with a bunch of .json files the ones that need to be loaded are:
Filename Mongo Collection name
sales_fact_1997_collapsed.json sales
foodmart_data_sales_transactions.json sales_transactions
agg_g_ms_pcat_sales_fact_1997.json agg_g_ms_pcat_sales_fact_1997
agg_c_10_sales_fact_1997.json agg_c_10_sales_fact_1997
Here is a sample command:
/> mongoimport -db foomart –collection agg_c_10_sales_fact_1997 –type json –file agg_c_10_sales_fact_1997.json
Upload Mondrian 4.0 Schema
– If you have Pentaho 5.1 EE the sample data is located in /pentaho-solutons/system/samples/FoodMart.mongo.xml
– Upload it into the BI Server from the Pentaho User Console (PUC) (instructions stolen from the Pentaho doc):
1. Login to the User Console using the admin username and password.
2. Open the Browse perspective by selecting this from the upper-left menu.
3. In the Folders panel, select the location where you want to store the schema. Click on Upload… in the Folder Actions panel. The Upload dialog box appears.
4. In the dialog box, click Browse to go to the location of the schema for upload. Double-click on the schema. If needed, specific permissions are set on the schema by using the Advanced Options settings.
5. Click OK. The schema is uploaded and available to specified users.
Add the connection details
Need to edit the olap4j.properties file, which is located under the pentaho-solutions/system folder. This is what I ended up with:
foodmart.name=MongoFoodmart
foodmart.className=org.pentaho.platform.plugin.services.connections.PentahoSystemDriver
foodmart.connectString=jdbc:mondrian4:Host=127.0.0.1;dbname=test;authenticationDatabase=admin;DataServicesProvider=com.pentaho.analysis.mongo.MongoDataServicesProvider;Catalog=solution:/home/admin/pentaho-mongolap/test-data/FoodMart.mongo.xml;username=bizreporter;password=ENC:Yml6M3ViZWQ=
Breaking this down:
– MongoFoodmart needs to match the <Schema name=‘MongoFoodmar’ in the mondrian schema definition.
– foodmart this uniquely defines the connection name – it needs to be the same at the beginning of each connection string.
– Host 127.0.0.1 – my mongo instance is on the same machine as the BI server – most implementations would have it on a separate machine.
– dbname I loaded all of the json data into a test db
– Catalog This is the location on the server that I uploaded the Foodmart.mongo.xml file. The key thing to notice is the “:/home” at the beginning of the path. This tells the platform to look at its own directories rather then the ones on the file system.
– username This is the specific user that has access to this data
– password I’ve used an encrypted password by using the password encryption utility here: http://localhost:8080/pentaho/api/password/encrypt
After these have been added to the olap4j.properties file, then we need to restart the server.
And we’re done. We test by selecting “Create New” -> “Analysis Report” and looking for the MongoFoodmart datasource.
Our free Proof of Concept Offer
BizCubed currently has a free proof of concept offer, where to help you evaluate Pentaho, we’d like to offer you one of our experts for two days. Our expert will work along side you, loading your data, making sense of it and building some impressive dashboards. This work will give you first hand experience with the tools, their power and how quickly you can deliver outcomes.
We will also provide a secure hosted environment – alternatively, we will get you things set up internally. This will provided to you free of risk and at no cost.
Click here for more information.
Register your Free Proof of Concept.
Register your Free Proof of Concept
The next blog post will be focused on building a Mongo schema and connection from scratch – not just using the examples.
Zachary Zeus
Zachary Zeus is the Co-CEO & Founder of BizCubed. He provides the business with more than 20 years' engineering experience and a solid background in providing large financial services with data capability. He maintains a passion for providing engineering solutions to real world problems, lending his considerable experience to enabling people to make better data driven decisions.