Edvancer's Knowledge Hub

How does a data scientist use programming – part 2

How to learn programming for data science like a pro

This is a continuation of my previous article. The elements of a data product do not have to be built in a set order. The professional approach is to build by giving the highest technical risk preference. Start with the element that is riskiest first and go from there. An element can be technically risky for a lot of reasons. The riskiest part may be the part you understand the least or it could be the one with the highest workload. You can build out components in any order by focusing on a single element and temporarily ignoring the rest. If you decide, for example, to start by building an algorithm, have some test input data which need not be very accurate and define a temporary spot to write the algorithm’s output. Then, implement a data product by giving elements with the most technical risk priority. Focus on a element, stub out the rest and replace the stubs later. For example, assume that you are doing an analysis on employee satisfaction data. You build a model that tells what are the factors influencing happiness and based on this model you decide to perform further analysis as to why those factors had so much importance. So, the objective here is to take to take employees that the model identified as happy, and build a topic model from their unstructured text comments. Also, the key is to build and run in small pieces: write algorithms in small steps that are easy for you to understand, build the storage one data source at a time, and build your control one algorithm execution step at a time. The goal is to have a working data product at all times— even if it isn’t fully functional until the end. Learn Like a Pro Every pro requires quality tools. There is a plethora of available options. I wanted this section to be a list of those tools, but the state of the art changes at such a rapid rate that the list would be out of date pretty soon after it reached the readers. What’s more useful than a list of tools are techniques for learning these new tools quickly and putting them to productive use. The way that new data science tools are usually presented, you have to be well versed in a lot of theoretical background before the tool can be applied to anything meaningful. Most of the data scientists are what Jeanette M. Wing refers to as a computational thinker. Data scientists think in terms of simple discrete transactions and they understand things by test-running them and observing the output. For data scientists, the mundane thought of sitting through lectures, doing homework, reading technical specifications is just…uuuugghhh! Here’s an alternative way of learning new tools: 1.Find a problem (small, but meaningful).
  1. Choose a tool.
  2. Get the tool to produce some output—any output.
  3. Tinker with Step 3 until you’ve addressed Step 1.
For example, before analyzing the employee satisfaction data, you might not know how to perform a topic model analysis. You must read enough about it to understand its purpose and to guess that if it is suitable for your problem. You must choose a library in R or Python and write code to fit your employee satisfaction analysis. The results have to be tested to make sure that you understand the algorithm. Learning this way requires some sacrifice. You have to be a realist— willing to focus on what’s needed to solve a particular problem. This way of learning about a model( in our case topic modeling)  will ensure your proficiency at a particular application. I’ve found the problem-solving approach to learning new tools is a very reliable way to learn the most important features very quickly.

Manu Jeevan

Manu Jeevan is a self-taught data scientist and loves to explain data science concepts in simple terms. You can connect with him on LinkedIn, or email him at manu@bigdataexaminer.com.
Manu Jeevan
Share this on

Follow us on
Author :
Free Data Science & AI Starter Course

Enrol For A Free Data Science & AI Starter Course

Learn R, Python, basics of statistics, machine learning and deep learning through this free course and set yourself up to emerge from these difficult times stronger, smarter and with more in-demand skills! In 15 days you will become better placed to move further towards a career in data science. Upgrade to the specialization programs at attractive discounts!

Don't Miss This Absolutely Free, No Conditions Attached Course