A Guide for Aspiring Data Scientists
I decided to publish this article for a number of reasons. First of all, I’ve been working as a freelance data scientist for some years, hence acquiring valuable experience that could prove useful for those wishing to enter the field. Second, numerous people have contacted me in the past year, asking for guidance and resources about data science. Unfortunately, I don’t have time to give personal advice to everyone, so I hope this will be a helpful guide for beginners. In this article, I included a number of steps you can follow to become proficient in data science, by acquiring theoretical knowledge, as well as practical skills. Many of the books and courses that are mentioned here are freely available, as I believe that education should be accessible to all.
Data science combines software engineering, statistics and machine learning, with the goal of extracting business insights from raw data. It can be a truly rewarding field of work, offering intellectual stimulation and significant monetary rewards, while being at the forefront of technology. Regardless, this field is also full of challenges and difficulties, that can be discouraging to beginners. Having a thoughtful plan can give you a significant advantage as an aspiring data scientist, so I am providing the following detailed steps, based on my personal experience. Furthermore, I sincerely hope this guide will be helpful to newcomers who are interested in data science, and wish to start their journey in the field!
Make Sure Data Science Suits you
Machine learning has advanced impressively in the past years, leading to massive media exposure. You can constantly read news about the latest achievements of machine learning in scientific research, medicine, business etc. This created significant hype around the technology, so countless people became interested in the field. For example, more than 4 million people have enrolled in the famous Machine Learning course by Andrew Ng on Coursera, indicating the huge interest for this subject. Subsequently, a huge amount of competition emerged in the job market, especially for junior positions.
Given those facts, you should understand that commitment and hard work are necessary to succeed as a data scientist. Before committing though, make sure that data science is a good fit for you. This type of work requires quantitative thinking, analytical skills, and constant learning. Data scientists need to enhance their skills by becoming acquainted with new software tools and libraries, acquire domain knowledge, and stay up to speed with the latest research. Not everyone has the same talents and aptitude, so I suggest that you take an introductory data science course to see if you actually enjoy it, before deciding to pursue this career.
Learn Data Science the Right Way
Due to the growing popularity of data science and machine learning, there are numerous online courses, books and degrees available for those interested. Having a relevant degree is not absolutely necessary, but my postgraduate studies in computer science have certainly helped me. Any kind of quantitative degree will benefit you, as familiarity with mathematics and computer programming is essential. In case you need to freshen your math knowledge, I suggest the excellent Mathematics for Machine Learning book, which is freely available. Furthermore, Introduction to Modern Statistics is a free textbook that will teach you the fundamentals of statistics, a necessary skill for all data scientists.
One of the most popular machine learning books is An Introduction to Statistical Learning by Gareth James et al. It’s a great introduction to all the theoretical concepts, including regression, classification, support vector machines, clustering and decision trees. The book authors have recently published the second edition, adding chapters about deep learning, survival analysis and other topics. They’ve also released an ebook version of the book as a free download, so make sure to check it out. Furthermore, Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka is another amazing book, covering all the fundamental topics in machine learning and deep learning. Finally, Simplifying Machine Learning with PyCaret is a book that I published last year, offering a beginner-friendly introduction to machine learning, based on the PyCaret library.
Online courses provided by Coursera, edX and Dataquest can also be great starting points for beginners, but you shouldn’t focus on course certifications, as your experience is what really matters. That being said, Stanford University provides free access to the CS229: Machine Learning course. The syllabus covers linear algebra, statistics, supervised learning, unsupervised learning, deep learning and other topics. You can watch the lectures on YouTube and download the class notes if you want. It is truly fascinating that a top-ranking university provides that kind of educational content, free for everyone to enjoy!
Create a Strong Portfolio
The best way to gain experience before finding your first data science job, is to work on personal projects. Start by finding a topic that interests you, then download a dataset from Kaggle or Google Dataset Search, and work on your project! It is even better to create your own dataset by scraping a website, as this is necessary in some real-world projects. All that may sound daunting to a beginner, but there’s no other way to become skillful and valuable to employers. Working through the examples of online courses and books is simply not enough, as you need to stand out from the rest.
After working on your personal projects, sharing them with others is the next logical step. Github is a free service that lets you create repositories for your projects, that can either be public or private. Creating a strong portfolio of data science projects on Github is the best way to promote yourself to employers, as it showcases your practical skills and accomplishments. Here’s a list with suggested data science projects:
• A data analysis and visualization project.
• A project focusing on statistical testing (t-test, ANOVA & chi-square).
• A project focusing on basic ML tasks, such as classification and regression.
• A time series analysis and forecasting project.
• A natural language processing project.
You should keep in mind it’s not necessary to create all those projects to have a strong portfolio. Furthermore, you should avoid using toy datasets like Iris/Titanic Passengers because they have been analyzed by millions of people and are considered trivial. Instead of that, try finding new datasets in a domain that interests you. This will motivate you to focus on the project and get a better result! In addition, make sure to document each project and write an executive summary or blog post with your insights. Communication is a key skill for data scientists, so don’t underestimate it.
Apply for Jobs
After acquiring the fundamental data science skills and creating a portfolio with your personal projects, the next step is applying for a job. Employees may feel insecure when looking for their first data science job, due to lack of experience and other factors. Most people will struggle with such feelings at some point in life, so you shouldn’t be discouraged by them. Instead, you can deal with insecurity in a productive way, by becoming motivated to improve your skills and knowledge.
LinkedIn is a great website to look for data science job listings, as most companies use it nowadays. Furthermore, Glassdoor is another great resource, offering numerous listings related to data science, machine learning and other associated fields. Finally, you may want to pursue a career as a freelancer, by using platforms like Upwork, Fiverr or Freelancer. Personally, I am quite familiar with the advantages and disadvantages of freelancing, so I will consider writing an article focusing on the topic.
Connect with the Right People
Nobody can achieve truly great things by themselves, so connecting with other like-minded professionals should be a priority to everyone. LinkedIn is obviously a great way to accomplish that, either by contacting people on your recommendation feed, or joining some groups. Messaging strangers may feel awkward at times, but this shouldn’t discourage you from communicating with people who are willing to collaborate. You should simply be respectful and avoid becoming burdensome in case they are busy, or simply not interested.
Another great way of connecting with people are hackathons. In case you’re not familiar with the concept, hackathons are events where teams of participants attempt to develop a software prototype in a few days, typically a weekend. Afterwards, the hackathon judges decide which team delivered the best result, and award the members with a monetary prize, or the opportunity to start a company based on their idea. Hackathons are an excellent way to meet people who share your passion, and even join a new startup company if you are lucky.
I hope this article helps beginners avoid common mistakes, and evolve as professionals in the field of data science. It is obviously not a definitive guide, but rather my personal advice to newcomers, so I welcome any constructive criticism and differing opinions. Do you agree with my tips for aspiring data scientists? Would you add any other helpful advice? I encourage you to share your thoughts in the comments, or follow me on LinkedIn where I regularly post content about data science, climate change and other topics!