The consulting firm McKinsey and Company estimates that
“there will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”
Why 10 times as many managers and analysts than those with deep analytical skills? Surely data scientists aren’t so difficult to manage that they need 10 managers! The reason is that organizations have discovered that data-driven decision making has become more profitable.
The explosive growth of the internet, and the proliferation of smart devices, cameras, microphones, sensors, RFIDs and so on, has led to the tremendous growth and affordable, easy access to large quantities of fast-moving, unstructured datasets, commonly referred to as big data. Aided by the data and a host of new technologies, managers are able to glean strategic, tactical and operational insights that yield quicker and more effective business decisions.
So there is a great opportunity for MBA professionals to become data science savvy mangers. Now, let’s see what it takes to become a data science savvy manger.
Learn Business Analytics using R and SAS from us!
Fundamental concepts of data science
Data science managers have to understand the fundamental principles of data science well enough to envision and/ or appreciate data science opportunities, to supply the appropriate resources to the data science teams, and to be willing to invest in data and experimentation. They must have the ability to steer the data science team carefully to make sure that the team stays on track toward an eventually useful business solution. This is very difficult if the managers don’t really understand the principles. Managers need to be able to ask probing questions of a data scientist, who often can get lost in technical details.
R / SAS
Even if your intention is to be more on the “Business acumen” side than on the “Technical Expertise” side of the house, it is important that you have the programming skills
to be taken seriously. You should learn either R or SAS
, as they are the most widely used programming languages by data scientists. Having the intermediate skills in either of these languages will help you to communicate confidently with data scientists.
Data Base querying
A query is a specific request for a subset of data from a data base using a query language like SQL. A data science manager should be savvy enough to run basic queries and retrieve relevant data sets from a data base/ data warehouse. For example, if the manager suspects that middle-aged men living in San Francisco have some particular interesting churn behaviour, they could compose a SQL query like this:
SELECT * FROM CUSTOMERS WHERE AGE > 45 and SEX = ‘M and STATE = ‘ SANFRAN’
The managers need to have fundamental knowledge in SQL.
Statistics allows to slice and dice through data, extracting the insights one needs to make reasonable conclusions. Understanding inferential statistics allows us to make general conclusions about an entire population from a smaller sample. To understand data science, one must also know the basics of hypothesis testing and experiment-design to comprehend the meaning and context of the data.
Data visualization and communication
They need to be able to communicate well with and be respected by both “techies” and “suits”; often this means translating data science jargon into business jargon, and vice versa.
Visualization wise, it can be immensely helpful to be familiar with data visualization tools like ggplot and d3.js. It is important to not just be familiar with the tools necessary to visualize data, but also the principles behind visually encoding data and communicating information.
Knowledge of a particular domain is also demanded by clients. In a long-term point of view, it’s suggested that you start building knowledge in a domain of your own interest well before you embark on your career. It helps in connecting analysis with the business decisions and consequences there by providing value to the client.
Machine learning is a method of data analysis that automates analytical model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look. Online recommendation offers such as those from Amazon and Netflix are examples of machine learning applications.
A data science manager will know the basic machine learning techniques like regression, clustering, SVM
, decision trees, etc. They will also understand how these concepts are applied to real world big data problems.
Now, let’s see how data science savvy mangers use these skills to find solutions to business problems.
Who are the most profitable customers?
If “profitable” can be defined clearly based on existing data, this is a straightforward database query. SQL could be used to retrieve a set of customer records from a database. The results could be sorted by cumulative transaction amount, or some other operational indicator of profitability.
Is there really a difference between the profitable customers and the average customer?
This is a question about a conjecture or hypothesis (in this case, “There is a difference in value to the company between the profitable customers and the average customer”), and statistical hypothesis testing would be used to confirm or disconfirm it. Statistical analysis could also derive a probability or confidence bound that the difference was real. Typically, the result would be like: “The value of these profitable customers is significantly different from that of the average customer, with probability < 5% that this is due to random chance.”
But who really are these customers? Can I characterize them?
Data analytics mangers would like to do more than just list out the profitable customers. They would like to describe the common characteristics of profitable customers. The characteristics of individual customers can be extracted from a database using techniques such as database querying, which also can be used to generate summary statistics. A deeper analysis should involve determining what characteristics differentiate profitable customers from unprofitable ones
Will some particular new customer be profitable? How much revenue should I expect this customer to generate?
These questions could be addressed by data mining techniques that examine historical customer records and produce predictive models of profitability. Such techniques would generate models from historical data that could then be applied to new customers to generate predictions.
Data informs every element of a modern business. Data science savvy mangers employ a structured approach to solve any business problem, they state hypotheses, seek relevant data, run analyses and build models to test the hypotheses, and present a recommendation.
You should also know how these skills are applied to real world business problems. The best way to understand how these skills are used to solve real world business problems is to work through many examples of the application of data science to business problems
. Read case studies that actually walk through the data mining process.
Actually mining data is helpful, but even more important is working through the connection between the business problem and the possible data science solutions. The more, different problems you work through, the better you will be at naturally seeing and capitalizing on opportunities for bringing to bear the information and knowledge “stored” in the data—often the same problem formulation from one problem can be applied by analogy to another, with only minor changes.