“Data science” is a relatively new field that combines knowledge of statistics, machine learning and programming in solving real world problems using data. The field is still very new and whilst many companies understand why they need data scientists, many don’t know what skills they should be looking for in potential hires. A lot of data science job descriptions have a long list of skills which suggests that companies are looking for a “unicorn” with all the possible combination of skills required (or perhaps they don;t know what they should be looking for). However, in reality, most of these unicorns do not exist and its often more practical to build a data science team with individuals who have strengths in various domains of datascience.
Broadly speaking, data careers can be divided into data science and data engineering. Data science is more to do with analysing, visualising and deriving meaningful insights from data. Data engineering on the other hand is more concerned with building data pipelines to deal with large datasets, etc. Data engineering is more related to software engineering whilst data science is more suitable for candidates from physical sciences background (including maths, physics, quantitative biology/neuroscience, computer science, engineering, etc). In reality, this distinction is often not clear on job descriptions. Also, due to a shortage of talents, intersections between data science and engineering is quite common. See the four types of data scientists for more.
How do you land your first job as a data scientist?
Getting that first job as a data scientist can be a very important first step in your journey to becoming a competent data scientist. It can very challenging especially if you’re coming straight from university with little “real world” experience. It must be said that a PhD qualification alone will not give you an automatic entry into a data science role-you will have to demonstrate your capability as a data scientist. having said that, there are various routes to becoming a data scientist and you don’t necessarily need a PhD to get into one but you do have to demonstrate a breadth of skills.
- PhD route: A common route for landing a data science role is through a PhD qualification in a quantitative/scientific field such as physics, mathematics, engineering, neuroscience, biology, bioinformatics, computer science, etc. There are bootcamps that specifically recruit PhD graduates where they get to work on projects attached to companies either alone (ASI Fellowship) or in collaboration with other graduates (Science to Datascience). Some popular bootcamps include the ASI fellowship, s2ds and insight fellowship programmes (note that the first two are based in the UK whilst the third is in the USA). For a more comprehensive list of data science bootcamps, see the following link.
- Masters/undergraduate route: Data scientists can be hired straight after an undergraduate or masters degrees into entry level data science roles. Its important to demonstrate your competencies through projects and if possible by doing an internship with a data-driven company. If going via the masters route, I would recommend a masters in machine learning. There are a few masters in data science but I feel these are probably too broad and non-specific.
- Portfolio/work experience route: This may be suitable for individuals who are already in industry and are wishing to move into data science roles. People in this group may typically come from analyst, software engineering, business intelligence roles. The key is to develop a portfolio of data science projects which can be uploaded to github, etc for employers to see. Other ways to show competence include contributing to data science open source projects.
So the question is: How do you land that first job?
- Know the basics of data science theory very well. This includes mathematics/statistics (linear algebra, calculus, numerical optimization, regression, algorithms, etc), programming (at the very minimum-python or R), machine learning algorithms, visualisation, some familiarity with big data tools (scala, spark). If interested in Data Engineering, its crucial to get familiar with the big data tools-scala, spark, hadoop. As a general rule, its important to be comfortable with at least one of the data wrangling tools (R or python)-this means at least 10,000 hours of coding in that particular language.
- Demonstrate your interest by undertaking a data science project in your spare time. Find a question that you can address using online data and showcase your work on a github account. If you’re already studying for a masters or a PhD, try to demonstrate your data science interests through your projects.
- If possible get some relevant industry experience related to data science and to the industry of your choice. Domain knowledge is crucial in being an effective data scientist. This can be acquired through an internship, kaggle competition, hackathon or through previous work experience.
- Network within the data science community by attending meetups, conferences and arranging meetings with data driven companies of interest.
- Keep up to date with the field by reading new articles, publications and algorithms. I personally follow data elixir, and other twitter accounts of prominent data scientists.
Lastly (and it goes without saying), apply to roles of interests. Some roles are not widely advertised and this is where networking becomes important. There are several roles out there but its important to scrutinize job specifications carefully to ensure they are truly datascience roles. All the best and watch out for my next article on how I made the transition to becoming a data scientist.