The Quant Crunch report
indicates that the demand for a data scientist is expected to rise 28% by 2020 and this does not encourage the fact that data science jobs are as such very difficult to fill now.
Undoubtedly, academic institutions are actively engaged with fulfilling this rapidly increasing need, but the demand for data scientists is fast outpacing the speed at which students can be trained up and absorbed into the industry.
This brings us to the question – what should the vendors of machine learning (ML) and artificial intelligence (AI) do to overcome this acute and critical skills shortage?
Firstly, define what the requirement is
Whom do we want to hire? A data scientist? Now, who or what exactly constitutes a data scientist? There are varied definitions within the industry as to what makes a data scientist. Many so called data scientists are neither data experts nor can be even remotely associated with scientists. In fact many of them have business cards that say something quite different.
Typically, a large number of data scientists come into existence in the following manner: An enterprise requires a person to resolve advanced business intelligence and AI problems. The individual who is thereafter employed becomes, de facto, the company’s data scientist. In general, that person’s role would involve:-
- Joining the business partners to figure out what questions need to be answered
- Acquisition and cleaning data
- Determining the algorithms to be used based on current best science
- Life cycle management of the project
Evidently, “data” and “science” are just a part of a bundle of multiple roles: of business performance, IT troubleshooting, basic admin and project management; all of which are the typical consolidated data science role. This is how the shortage data scientists is created. They end up diverting all their time and energy balancing these multiple job roles.
A Data Community is required
Though the shortage of data scientists does exist, it is also inflated. It is assumed that data scientists spend 50% of their time on actual AI-related tasks and the rest on auxiliary AI tasks. The simplest thing to do is to hire another data scientist, thereby reducing the already-small pool of data scientists. Another option is to relieve them of their non-data tasks, thus allowing them to focus the balance of their energy and time on the AI project. Either way, another data scientist has been added to the enterprise.
How do we divest data scientists of their non-data tasks? Invest in software and tools that can support the data scientist by handling basic admin and other low-level yet time-consuming tasks. The data scientist should positively be focusing on high-value things like studying and deciding which model to use and should not be bothered on other aspects like how to implement all the changes to the system if that model is employed. After shedding this extraneous workload, the data scientists can be highly efficient and effective.
Yet another approach is to build out the team. Individuals with complementary skills should be hired to help the data scientist. Alternatively, redefine the whole idea of a data scientist and go with a data science team instead; have a few people: a data analyst a machine learning engineer and a data engineer, for example. By doing so, you would bridge any knowledge or skills “gaps” that might exist, further your scientists become specialists rather than generalists.
Things could only get easier hereafter
The awe that computers commanded a few generations back: arcane, hidden and accessible only to a select few, is akin to what AI presently does. As in the case of computers, AI will also become more widely available and easier to use. With the widespread proliferation of AI and as the tools become more user-friendly, barriers to entry will go down, and data science would become a common skill that almost everyone would be reasonably literate in. Parallels can be drawn from what Squarespace and Wix have done for coding up websites.
While the options are to fight over our meager pool of data scientists or train new ones, the demand is there, the pipeline is a slow one, especially in the rapidly moving industry. There will be more data scientists to pick from in a few years, but they would be inexperienced and will require time to grow into higher-level roles. Also, the market demand is only going to grow.
Therefore, what should the AI and ML community do? They should take a critical look at how their data scientists are being used and ensure that they’re being used efficiently and effectively. Further, they need to make concerted and dedicated efforts to offload the existing data scientists’ shoulders from additional extraneous tasks to the greatest possible extent and make the necessary structural changes to ensure that the next generation of data scientists has everything required to succeed.
Bridging the AI talent gap isn’t about having our data scientists putting in more efforts. It’s about optimizing and deriving the best out of them, by giving them more to work with.