Big data is a buzzword that gets thrown around a lot. There are oodles of different definitions out there, and the term gets applied to a range of data from digital sources to more traditional forms. Yet, at its core big data is essentially any large data set that can range in size depending on the capabilities of the business managing the data and the methods that are being used to process and analyse the data set. This big data can be found in many different forms and structures including structured, multi structured and unstructured data sets.
- Structured data is found in fixed fields within a record or file such as in relational databases and spreadsheets.
- Multi-structured data is that which exists in a variety of formats that can be derived from interactions between people and technology such as weblog data and data from social media.
- Unstructured data is not uniform or organised in any predetermined manner. It is usually in text heavy and is not easily interpreted by traditional databases as it does not have a pre-defined data model. Examples include metadata, documents, books and even audio and images.
In addition, big data can be used to describe the availability and exponential growth of this structured and unstructured data. This data can then be used as a tool for analysis and informed decision-making.
Is there a better way to look at big data?
The team here at BizCubed recently attended a briefing with Forrester and Pentaho about big data for financial services. Access the webinar here.
During the briefing we heard an interesting alternative definition for big data:
“ Big data is the practices and technology that close the gap between the data available and the ability to turn that data into business insight”
This got us thinking! We concluded this is not only a really usable definition of big data, but we also think it describes activities going on in every business at almost every level today.
Analysing big data using this definition.
Big data differs from regular data because of its sheer size and the rate at which it can accumulate. In 2012, about 2.5 exabytes of data were created each day, and this number is likely to have doubled by 2015. This means that some companies (such as Walmart in the US) are working with data sets the size of many petabytes. To put these numbers in perspective an exabyte is around one thousand petabytes and a petabyte is around one million gigabytes- which, if printed, would equate to millions upon millions of filing cabinets full of documents.
This may seem quite intimidating, especially if you are just starting out with big data, but up close big data, like any data, is a series of points or dots that on their own have very little value. When you put the dots together and step back, you can begin to see the patterns the data points make. The insights gained through analysis of this data can lead to improved efficiency and day-to-day running of a business.
For most businesses the tool that is used to close the gap between available data (what comes natively out of standard systems and processes) and the business insight that is used to run the business, is a spreadsheet.
Spreadsheets, first took off in the 1980s with the introduction of Microsoft Excel and since this time Excel has been the backbone of data collection, analysis, and reporting. This is largely because as a significant portion of the workforce already has spreadsheet skills, its ease of use, and because business users are able to construct reports without having to go to IT with requests. Excel is therefore often seen to be an easy entry pathway in to big data applications and analysis. The process involved in using Excel or a spreadsheet for big data analysis is typically is as follows:
1. A manager or executive has an idea about how to improve the business or fix a problem.
2. They wonder if the data in the company’s database supports their idea.
3. An analyst, who usually has more of a business background and skills than technical capabilities, is asked to determine if the data backs up the concept.
4. The analyst collects the data that is available to them and uses Excel to mine that data to determine if the data backs up the manager’s ideas.
5. If the ideas are implemented – then the mining process that has occurred is usually repeated in the future.
If successful, this method is usually very flexible, available to a large number of users and it provides the management team with answers quickly. However, Excel's rows, columns, and other limiting factors, have not made it a tool that is robust enough for working with big data. As we have discussed in other articles (HERE and HERE) – what it doesn’t do is provide the flexibility and scalability that organisations require to take advantage of big data sets.
Is there an alternative to spreadsheets?
Yes! This is where a Data Integration System such as Pentaho Data Integrator (PDI) can help. Like a spreadsheet, non-technical people can easily use PDI, and the end user does not need coding skills. But where Pentaho really jumps ahead of systems like Excel, is that itprovides a much more robust system for analysing the information contained in big data sets. With Pentaho, the analyst can have complete control over the entire analysis process from extracting source data, all the way to drawing out the final insights.
Data statistics from:
Our free Proof of Concept Offer
BizCubed currently has a free proof of concept offer, where to help you evaluate Pentaho, we’d like to offer you one of our experts for two days. Our expert will work along side you, loading your data, making sense of it and building some impressive dashboards. This work will give you first hand experience with the tools, their power and how quickly you can deliver outcomes.
We will also provide a secure hosted environment – alternatively, we will get you things set up internally. This will provided to you free of risk and at no cost.
Click here for more information.
Register your Free Proof of Concept.