Amazon SageMaker Studio Lab: A Great Alternative to Google Colab

Introducing the New Free Machine Learning Platform of AWS

Giannis Tolios
Towards Data Science

--

Photo by Scott Graham on Unsplash

Getting started with machine learning can be a daunting experience, especially if you are not technically oriented and experienced with computers. Anaconda is an amazing tool, but it requires some expertise to set up and use correctly. Furthermore, having a powerful computer is necessary to train machine learning models on big datasets, making it especially true if you are interested in deep learning. Fortunately, there are various alternatives that let beginners experiment with machine learning, by easily executing their code on a cloud service. Google Colab has become an industry standard in the past years, as it is a user-friendly service that can be easily accessed, simply by creating a Google account. Amazon has recently introduced SageMaker Studio Lab, an alternative service that offers useful features and functionality. In this article, I am going to present its main features, as well as provide a short tutorial to help you get familiar with the service. Let’s get started!

Introducing the Service

SageMaker Studio Lab provides free access to AWS runtimes that are optimized for machine learning and deep learning tasks. In case you are interested in technical specifications, the CPU runtime is based on the T3.xlarge instance, while the GPU runtime runs on G4dn.xlarge. Each runtime comes with JupyterLab, a web-based interface that lets you create and execute Jupyter notebooks, the de facto standard of data science. Furthermore, the service provides 15 GB of persistent storage, letting you store datasets and experiment results.

Having access to AWS runtimes can be immensely useful. For example, people that have trouble setting up Anaconda and JupyterLab on their own computers, could benefit significantly from it. Furthermore, even experienced users don’t always have access to powerful hardware, so utilizing a cloud service is a great option in that case. Finally, GPU prices have increased in the past years, due to hardware shortages and various other reasons. Therefore, utilizing the GPU runtime of SageMaker Studio Lab will be invaluable for those who can’t afford buying a GPU card!

Comparison with Google Colab

Google Colab is the main alternative to SakeMaker Studio Lab, so I’m going to briefly compare them. Google Colab is an established service, used by millions of data scientists and machine learning engineers, so most professionals in the field are already familiar with it. In my opinion, Google Colab is streamlined and easy to use, while also having advanced collaboration features, an area where SageMaker Studio Lab is somewhat lacking at the moment.

Regardless, it should be noted that Google Colab doesn’t provide persistent storage, with data being lost every time the instance is restarted. Furthermore, Google Colab usually assigns the Tesla K80 GPU to free accounts, whereas faster GPUs are reserved for subscribers of the premium Colab Pro service. On the other hand, the GPU runtime of SageMaker Studio Lab utilizes the Tesla T4 model, a GPU that is significantly better compared to Tesla K80, thus making the service compelling for those interested in deep learning. Regardless, both services are great choices for executing Jupyter notebooks on the cloud.

Getting Started with SageMaker Studio Lab

Image by Author

First of all, you need to visit this link and complete the form to request a free SageMaker Studio Lab account. According to Amazon, requests are typically approved in 1 to 5 business days, but I got access to my account a few hours after submitting the form, so you may get lucky. After your account has been approved, you can log in to the service with your credentials.

Image by Author

After logging in to SageMaker Studio Lab, you simply have to select your preferred instance type, that can either be based on CPU or GPU. In this case, I have decided to start the CPU runtime, as seen on the screenshot. After clicking the “Start runtime” button, the session starts and remains active for the following 12 hours. If you choose the GPU runtime, the maximum time is limited to 4 hours. The runtime will be restarted when the time limit is reached, but your files will be saved to the persistent storage. We can now click the “Open Project” button to start the JupyterLab interface.

Image by Author

We are now using the JupyterLab environment that provides advanced features, such as executing Python code, writing Markdown text, running Linux commands on the terminal, as well as various others. Furthermore, you can install any Python package you want by using thepip and conda package managers. In case you are not familiar with JupyterLab, I suggest that you read the official documentation to get familiar with it.

Image by Author

If you want to get acquainted with SageMaker Studio Lab, you can clone the studio-lab-examples repository and experiment with the provided notebooks. As seen on the screenshot, this can be easily accomplished by starting a terminal and running the following command. You can obviously clone any other Github repository that might be useful as well.

git clone https://github.com/aws/studio-lab-examples.git

Executing a Jupyter Notebook

Image by Author

After cloning the studio-lab-examples repository, we are now going to execute one of the notebooks it contains. I have chosen the EDA_weather_climate.ipynb notebook, but the process is similar for the rest. First of all, we need to open the NOAA_Exploratory_Analysis directory and right click on the env_eda.yml file. After doing that, we select the ‘Build Conda Environment’ option to create a Conda environment with all the necessary packages for this notebook. The commands will be executed on a new terminal and the process may need a few minutes to be completed.

After the environment is created, we can activate it by selecting the eda kernel. Keep in mind that creating a new environment for every notebook isn’t necessary, but it is considered best practice so we can avoid package dependency conflicts and other issues.

Image by Author

After activating the eda Conda environment, we simply need to select “Run All Cells” to execute all notebook cells and render the resulting graphs, as seen in the screenshot above.

Conclusion

As it is evident, Amazon SageMaker Studio Lab is a compelling service that provides easy access to a JupyterLab environment, letting anybody work on machine learning projects. Its advanced features and functionality make it a great alternative to Google Colab, thus introducing competition that may incentivize Google to improve its service, perhaps by offering persistent storage to Colab users. Therefore, I encourage you to experiment with SageMaker Studio Lab and see if it suits your needs! Feel free to share your thoughts in the comments, or follow me on LinkedIn where I regularly post content about data science and other topics. You can also visit my personal website or check my latest book, titled Simplifying Machine Learning with PyCaret.

--

--