Tweet
Blog --> Why Big Data with BDILogin To Create Post

Why Big Data with BDI ?

- By Avin Jain (CEO, BDI Systems)
 
Background:
In the classic analytics space, we have companies that do algorithmic crunching and share their findings as a white papers. Those findings lose their relevancy very quickly. We also have top notch vendors such SAP, Oracle, Microsoft, and IBM that charge very high licensing fees for databases, platforms, and the 20-25 different products that they have acquired over the last 10 years. These ?Big Four? are coming up with good solutions for Big Data. However, those solutions are extremely costly, with very limited talent bases available in order to implement them.
 
With the advent of Big Data and Hadoop based infrastructure, today's analytics companies need to have multiple skills. They should be able to setup a Big Data Engineering Lab with engineers knowing the best practices to deal with large volumes of data. Those engineers should be data scientists as well. Finally, your business needs a strong BI Visualization framework or tool where dashboards can be viewed on all devices. Companies such as BDI that have all of these capabilities and the ability to work from end to end using home-grown utilities & open source software can do this job at best cost and give real time analytics.
 
What is Big Data ?
The following diagram represents Big Data by pulling together various experts' explanations into a 'data cobweb'.
 
Why Big Data Analytics is required ?
It is well known that data volumes are growing exponentially. What's not so clear is how to unlock the value it holds. To improve the health of a person, we monitor all of that person's parameters. However, in the case of an organization, over 50% of its data is either unstructured or partially structured. Therefore, we don't use that data to check the health of organization. The health of an organization is always relative. We need to keep a close watch on what is happening in our business dimension vis-à-vis our competition. The organizations that don't do Big Data Analytics will likely perish in the next 10 years. To view this another way, one can say that the organizations that had implemented Big Data projects over the last 5-10 years are ruling the internet world today. The internet is forcing all business platforms to get on board, and Big Data will decide the financial growth, competitiveness, and target markets of any progressive organization.
 
In this document, there will be repetition around the following 4 key Skills. In my view, these 4 skills are essential to delivering a profitable, real-time Big Data solution to an enterprise:
In this document there will be repetition around these 4 Skills. In my view all these 4 skills are important to deliver a profitable real time Big Data Platform to an enterprise.
BDI is able to provide these 4 skills. Customers can be completely confident that BDI, as a single vendor, is able to take care of all 4 legs of Big Data Analytics, and that BDI will continue to provide world-class support.
 
Pre-requisites for Big Data Implementation
Different analysts have been discussing ROI from Big Data implementations. Some state "25% return", others say "55 cents return out of 1$ investment?. These statements baffle us here at BDI. According to BDI, the minimum ROI should 30$ return out of $1 invested. If this is not achieved within 5 years, there is something grossly wrong. One of the key points is to ensure that the following pre-requisites are met before your business starts a Big Data project:
You have a full-fledged BI implementation system and you are happy with the ROI it has delivered to your organization.
You have done a 'What-if' analysis of your key Financial parameters and have been flexible enough to make the necessary changes to your organization.
You have been taking regular feedback or surveys from your customers, partners, vendors, etc
You have defined a clear problem that you want a Big Data implementorto solve.
You have allocated a specific budget to solve this problem. Part of that budget should go towards allocating time for internal resources to work with a Big Data partner/vendor such as BDI
 
Note: In case you have not done steps 1 - 3, we suggest that you take BDI's BI consultancy services and do a complete BI implementation. You can then have your internal team implement it.
Exception to Pre-Requisites
In cases where you have identified a separate and specific module where you want to directly leverage a Hadoop based data warehouse implementation in order to save cost/TB, you may go ahead. This would not be a true Big Data implementation project but more of a POC (proof of concept).
1. Subject Matter Expert(SME) at Work
The SME(domain) expert understands the problem statement and defines the problem in a way that a technical team can use it to build a solution.
The SME knows where the data resides, what data is useful, and which data can be used for which types of Analytics.
The SME can design different Scorecards, Algorithmic charts, Benchmarking Analysis reports, & dashboards.
The SME will work with the end-client to define the actual problem, define the scope, and drive the technical team.
 
2. Unlock Big Data - Working on a Data Collection Layer
Understand the existing data sources.
Search and navigate data within existing systems.
Reading Web Data - Crawling or Scraping of data. Data can even be scrapped from images, PDFs, documents, audio files, and video files..
Reading Social Media Data - BDI has a 'connector' through which we can read data from Facebook, Twitter, Linked-in, etc - This connector has been developed by BDI. More details on this can be found in another blog on the BDI website.
Reading structured and unstructured data from Web Data applications such as Sales Force, Google Analytics, etc
Providing end-to-end survey services where data can be trapped using a normal survey or a text based survey. BDI has a complete end to end Survey Platform and Services [www.BDIsurvey.com]
BDI has dynamic HTML5 forms through which metrics data can be entered from Mobile devices which can directly be used for dashboards. - This tool is provided by BDI as a part of our HTML5 Portal, and can run on all devices.
 
3. Data Processing Layer
Data Clean Up - Unstructured data can be too vast, and most of it might be meaningless. However, its collective message can be meaningful and impactful. Your enterprise needs to filter and clean this data (convert data to lower case, remove punctuation marks, stem words for exact matches, etc...). NLP - Natural Language processing techniques are used at this stage.
Categorization & Classification of Data -Use Machine Learning Tools such as Apache Mahout or Enterprise R. Each tool provides different algorithms for Clustering and Classification. Automated Text Conversion is also used here for proper classification of unstructured data.
Finding the relationships of different data and pushing this into Hadoop File System: This process uses various tools and paves the way for modern data warehousing that will change the manner in which we think about conventional databases.
Hadoop Framework -
Benefits of Hadoop based Data Warehouse Implementations:-
With Hadoop, you don't need to know what questions you need to ask before designing your data warehouse - Hadoop is flexible.
Simple algorithms on Big Data outperform complex models.
Powerful ability to analyse unstructured data.
Your enterprise can save Millions of Dollars in TCO.
10x Faster, 100x more economical long term solutions.
Maintains the current SLAs that your enterprise currently has in place.
Changes can be implemented without impacting users.
 
Data Organization Layer
Relevant data can now be moved into HIVE (a Row-Column DB similar to MySQL or Oracle).
Queries can be written against this data using Hive Query Language.
The latency is high when using Hive. A Data Mart Layer is being developed to address this. In memory structures such as Spark/Shark are in currently the R&D stage. These will be released soon in order to give a 'in-memory' flavour to open source databases.
Various tools such as Cloudera, Impala, etc may also be used as a MPP (massively parallel processing) query engine.
Your enterprise may arrange and include data from other structured databases (MySQL, Oracle, etc) as well.
 
Data Warehousing (DWH) Layer
Bring in Hive data using a cron-job or a scheduler.
Bring in relevant structured data such as Finance, Production, Inventory, Sales, and HR from various systems.
Apply traditional DWH practices to read the data in the simplest possible way.
Inspect the existing DWH, and improve it further for more effective Big Data reports.
 
Data Connector Layer of Analytic Engine
using a 'connector', relevant data can be read using analytics engines such as R Server or a third party analytic applications such as Tableau, Jasper, SAP Business Objects, or BDI's.
The data is 'fed' into R Server and data is pushed back into the server layer of the of BI Visualization framework.
 
- By Avin Jain
 
Showing All 1 Comments:     
Excellent Blog sir!!

By: Feroz Ahmed    On: 2014-01-07 14:35:13.0
Name:
Email-Id:
Write Comment:
Share This Blog:
 
 
Copyright © 2008-2016. All rights reserved. BDI Logo