How to configure NVIDIA GPU to work with Tensorflow 2 on AWS SageMaker

AWS SageMaker is the platform that helps data scientists and developers to prepare, build, train and deploy machine learning models. It is gradually becoming the de facto compute environment for both private and public sector organisations because of its popularity and easy-to-use Jupyter notebook environment.

There are different prices depending on the level of computation required however, accelerated computing which provisions Graphical Processing Units (GPUs) are not preconfigured to work with Python 3.7 and Tensorflow 2.

My goal is to show the step-by-step installation process of getting Tensorflow to recognise and use SageMaker GPU configured for Python 3.7.

Step 1: Understand current SageMaker Instance

To get a Deep learning GPU provisioned, you have to have at least a P3 instance. The P3 instances are powered by NVIDIA TeslaV100 Tensor Core GPUs.

The current demo instance is ml.p3.2xlarge, and as at the time of writing, the version of the NVIDIA driver is 450.80.02 with Python 3.6.12. To verify, open a terminal prompt and type:

The command invokes the Nvidia System Management Interface (SMI) showing the driver version, the name of the GPU, here Tesla V100 series and other information such as GPU temperature, power consumption and amount of memory being used out of the 16GB available.

Also verify the version by typing:

Step 2: Install CUDA and cuDNN

Tensorflow specifies software requirements which must be in place for GPU computation and processing to work. However, these are mainly for Ubuntu version of Linux and not CentOS on which Amazon Linux is based on. Tensorflow requires CUDA 11 to work with versions ≥ 2.4.0.

Important note: Even though the output from nvidia-smi above shows CUDA Version 11.0 (top right), the CUDA version actually installed on SageMaker is CUDA 10.0. This can be verified using the NVIDIA CUDA compiler command:

Go to, select Linux > x86_64 > CentOS > 7 > runfile(local) to download the CUDA software. Use wget to download and install it with the bash sh command as below. Follow the on-screen prompts and install defaults.

When prompted, accept the EULA and press enter:

And then scroll down to Install and press enter

Upon completion, a prompt asks whether the current symbolic link should be updated. Select yes. This is because the current link points to CUDA 10.0. Multiple versions of CUDA can exist in an instance, however, the active one is the one linked by /usr/local/cuda.

The installation of NVIDIA driver and CUDA is now complete. Runing nvidia-smi reveals the updated driver version.

The version can be confirmed with the nvcc command:

Next, download NVIDIA’s CUDA Deep Neural Network (cuDNN) library. cuDNN essentially helps to make various neural network computations such as convolution, pooling and normalisation more efficient. To download cuDNN, a free NVIDIA developer account has to be created.

Go to and click on Download cuDNN. Login with your account or create on. Once logged in, select cuDNN Runtime for RedHat/Centos 7.3x86_64(RPM) as shown below. RPM stands for RedHat Package Manager which is an open packaging system that runs on Red Hat Enterprise Linux as well as other Linux and UNIX systems such as Amazon Linux.

As at the time of writing, libcudnn version 8.0.5 rpm was available. Download it and install as admin with sudo as below.

Once completed, it’s now time to configure the environment variables.

Step 3: Configure Environment Variables

Environment variables affect the way running processes behave on a computer whether running Linux or Windows operating system. We need to tell the system where to find the installed CUDA and CUDA Profiling Tools Interface (CUPTI) which is required by Tensorflow.

We also need to make visible the Accelerated Linear Algebra (XLA) flag for Tensorflow GPU to find it. The XLA is an optimising compiler for machine learning and is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes.

Run the following one chunk at at time from the terminal

The above commands append the right software locations to the user’s PATH environment variable where it is visible to software such as Tensorflow. After this, it has to be made active using the source command:

Step 4: Activate Python 3.7 and Install Tensorflow GPU

For my particular project, tensorflow needed to installed on Python 3.7 which is provided by AWS SageMaker as an Anaconda Python installation. To get it to work, it has to be activated. This can be done by running the command below (just the first line):

The second line with (base) is the new resulting prompt which confirms that base anaconda python is activated in a virtual environment. To confirm the python version, type:

Once Python 3.7 is activated, install tensorflow GPU

To install a specific version of tensorflow e.g. 2.4.0, use the “==” sign

After installing tensorflow GPU, check that it is properly installed by importing it.

Check how many GPUs can be seen by Tensorflow at the python prompt after importing tensorflow.

Simple Arithmetic with Tensorflow

We will perform simple addition, element-wise multiplication, and matrix multiplication. Paste these at the

The computation is performed using the GPU.

Congratulations!! You have successfully installed NVIDIA driver, CUDA. cuDNN and Tensorflow GPU to work with Python 3.7; your deep learning jobs are ready to go!

Step 5 (Optional): Releasing GPU Memory.

Any time a major deep learning computation is performed using GPU, Tensorflow tends to still hold on to the GPU memory even after the job has been successfully completed. This tends to hinder other the success of other tasks you may have. There are probably many ways of achieving this but one way that works all the time is killing the nvidia process hugging the memory.

If this is the case,

a) run nvidia-smi and locate the process

Here process 14237 by python is hugging almost the entire 16GB of available GPU memory. This is from a Tensorflow computation. If after exiting python the memory is still not released, you can kill the process to force it to close and release the memory.

The memory is now released.


We have successfully configured NVIDIA Tesla V100 Tensor Core GPU to work with Tensorflow 2 on AWS SageMaker using CUDA 11 and cuDNN 8. This enables data scientists, researchers and developers to work more efficiently and get deep learning jobs completed much quickly.

If your organisation implements Lifecycle configuration for SageMaker instances, it may be necessary to perform additional steps that will persist when the instance reboots with the Lifecycle configurations. We will find our in our next post.

Data Scientist and AI Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store