Edvancer's Knowledge Hub

7 questions to ask before starting a big data project

Manu Jeevan 21/08/2018

The data your business generates on a daily basis holds immense potential and big data analytics can help reveal useful insights from it. But before you make the decision to start a big data project, it is better to ask yourself these questions: 1) Is there an agenda behind the project? Yes, there should be an agenda. But it should be a positive one. The outcome of the project should not be interpreted in such a way as to confirm a pre-existing assumption. This could lead to a confirmation bias – a situation where other members of your organization start suspecting that the results have an agenda behind them and reject the results based on that bias. The outcome from your data should drive your decisions going forward. 2) What is the objective behind starting the project? Any big data project must have a clear objective/business goal right from the start. It could be to drive sales in a particular segment or to reduce customer churn. But an objective must be established. A big data project that is started for the sake of using big data technologies will simply lead to a dead end. 3) Do you have the right data sources? The data sources need to identified and the data should be relevant to your business objective. You have to also consider the volume, veracity and velocity(3Vs) of the datasets before you try to use them in your project. Unlike relational databases, big data technologies can analyze unstructured and semi-structured data. This means that sources such as video feeds and social media can also be used for analysis. 4)What is the cost involved? You need big data engineers and data scientists to perform data analysis and to build data pipelines for the big data project. These professionals are very expensive to hire. The average salary of a data scientist is $123,000, and a big data engineer is $95,000. So even before you begin, you need to be sure of the resources you need, and if you can afford them. The cost of setting setup enterprise-scale big data management and analytics infrastructure to store huge volumes of data should also be accounted for. With the advent of Hadoop, organizations have realized the futility of using traditional data warehouses in petabyte environments where data types and memory requirements are dynamic. Hadoop can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. A petabyte hadoop cluster which consists of roughly 125-250 nodes would cost anywhere between half a million and a million dollars. A similar enterprise data warehouse would cost between $10-$100 million. [source: Forbes] 5) Is your management on board for a full-scale project or do you need to build and show a POC(proof of concept)? ROI is the proof management always looks for in order to validate a project. If your management isn’t convinced on funding a full-scale big data project, a POC can help your team come up with a ballpark ROI. Implementing even a simple POC will require significant time and effort, along with documentation of its objectives and use cases. You can deploy the POC application on cloud to reduce costs further. 6) Do you have a competent in-house data science team? Building a data science team is an extensive process. It doesn’t matter whether a company outsources or performs data science in-house, they need a highly skilled data science team to execute big data projects. An ideal data science team will have a chief data scientist or data officer, data engineer, data solutions architect, data platform administrator, and a visual data designer. Do you have these people? 7) What’s the implementation strategy? Carefully consider whether you will opt for a solution deployed on cloud or on-premise. Factors such as capacity, flexibility, cost, reliability, and technical experience will have an impact on this decision. When you take the cloud route, you forego the need of:

Capital budget required to procure required hardware and software
Technically skilled professionals to deploy, integrate and manage various components of the system
Time for a full procurement and deployment process

Big data analytics projects aren’t one size fits all kind of solutions. Each firm will have it’s own set of use cases and should accordingly expect to have distinctly unique amount of returns, despite the investment. It is better to have an open mind than expect the project to behave like a magic box that churns out gold. Answering these questions can give you and your team more clarity on what to expect from a big data project.

About
Latest Posts

Manu Jeevan

Manu Jeevan is a self-taught data scientist and loves to explain data science concepts in simple terms. You can connect with him on LinkedIn, or email him at manu@bigdataexaminer.com.