Getting to learn data science can be quite intimidating. More so, at the very start of your journey. There are many questions that you need to answer as part of your journey. Which techniques should I focus on? How much of depth do I need to go into statistics? Is coding required to be learnt? Which tool requires to be learnt – R or Python?
This simple and fairly short guide has therefore been created to help people starting in Analytics or Data Science and can set you on the right path. This difficult and apparently intimidating period would be smooth sailing as this guide would set a framework which can help you learn data science or analytics.
Following these tips will get you a good head start in your career.
So let’s get go!Emulate one or two people who know what they’re doing.
There is a plethora of diverse tools and techniques for approaching data work, and if you attempt to simultaneously master a lot of different techniques, you won’t be able to fully understand or accomplish any one technique. My recommendation is to select one or two people working in the data science field and emulate them. Two most widely used programming languages in the data science community are Python and R. I would recommend you to learn either of them. Assuming you choose R programming, pick a few R programmers to emulate. Thereafter, listen to all of their online talks, peruse their blogs and follow their activity on Github. This will result in you acquiring a deep understanding of a few small areas of the language though having missed out on a lot of other areas. For example, you might end up learning dplyr thoroughly, but wouldn’t have grasped much of object oriented programming. It’s a good idea to try to develop depth of knowledge, rather than breadth, because when you know one thing in depth you can usually apply that knowledge to other areas. A superficial understanding of many areas won’t help you tackle advanced problems in a specialized area.“Learning to code” and “learning statistics” are terrible goals because they have no end point
While developing a new skill, it’s crucial to have specific criteria for success. This assists in keeping you on track and also helps mitigate imposter syndrome. It is not desirable to move your goalposts as you develop your understanding. Viewed from this perspective, “learning to code” and “learning statistics” are goals to be avoided, because there’s always more to learn about these fields. It’s wiser to have smaller goals, like, “Learn to write a function in R,” or, “Be able to fit a linear model,” because these things can be accomplished. Goals that can be achieved are good things because, rather than being constantly reminded how far you have to go, you get to accomplish them.Focus on trajectory
It is but natural to compare ourselves with others and judge our own skills in terms of other people’s skills. The issue with this is that as our understanding improves, we have the tendency to change our measures of comparison to more and more accomplished people. This problem becomes more acute when we compare our own general understanding of an area to that of specialists. For example, you might have a good broad overview of neural networks, but on comparing yourself to someone who studies them full time, your understanding will obviously be substantially less in comparison. Comparative thinking of this nature leads to feeling insufficient, because no matter who you are or how much you know, there is always somebody who knows more.
Focusing on trajectory would be a far better approach . Ask yourself if you are making progress rather than whether you are relatively successful. Reflect about what you knew yesterday and feel good if you learned a bit more today. Eventually, that approach will lead to much better understanding with much less agony.Try to Ignore Boundary Setting Behaviour
Boundary setting behaviour occurs when people who are part of a group try to draw the lines around that group to include themselves and exclude you. For example, programmers sometimes say things like, “Real programmers use the command line,” or, “You really need to learn Scala if you want to be a good programmer.” The motive for this is not to accurately define the boundaries of the discipline, but instead to make themselves feel better about the skills they possess. Often, out of insecurity, people will express and highlight the importance of their own skills and try to minimize the importance of the skills that they lack. Roughly half of the stuff you read is written to address that insecurity rather than to educate you. Where possible, you should try to avoid this kind of advice.
However, you will definitely encounter it. You would be rejected from jobs, or made to feel like an idiot, because of boundary setting behaviour, and there’s nothing one can really do about that. People coming from non-traditional background, while applying for data science jobs, believe all of the following:
The most common data science tasks are communication, visualization and data manipulation.
Rather than trying to find one person who knows how to do everything, we should focus on building teams with distinct capacities.
Most of the time a complex machine learning algorithm is not desired, but a simple linear model is required.
We should expand the applicant pool as there is a labour shortage for data science roles.
An advanced degree in statistics is absolutely essential to apply for a job at the company.
All of these things can be true at the same time, but more often than not, those numbered from 1 to 4 describe a lot of the actual job requirements, and serial number 5 is boundary setting. But on applying for and getting rejected from the job you can’t really ignore that behaviour; it has to be dealt with emotionally.
Boundary setting behaviour can easily be recognized when people start to equate being a member of a profession with being a particularly good member of that profession. For example, you might be told that to be a real data scientist you need a PhD in statistics, and should have mastered R, Python, and big-data query languages, and, in addition, be an exceptional written and verbal communicator. Possessing these skills probably makes you an extremely skilled data scientist, but do really hard boundaries exist around the profession? I’m not very sure. Rather than the job requirements, in most cases, we talk about jobs based on the job title. If you’re a baseball player who stands near first base you are called a first baseman, if you write for a living you are called a writer. These things hold true even if you write trashy science fiction novels or are a terrible baseball fielder. The market sets the boundaries of the profession, not your skills, and so you can therefore be a good or bad example of the profession, without having that change your membership in that profession. I think the same thing should hold true for programming. Can you get a job writing computer programs? Then you’re a prospective programmer. Do you work with data for a living? You can then probably call yourself a data scientist.