An automatic Tensorflow-CUDA-Docker-Jupyter machine on Google Cloud Platform


For a class I'm teaching (on deep learning and art) I had to create a machine that auto starts a jupyter notebook with tensorflow and GPU support. Just create an instance and presto - Jupyter notebook with TF and GPU!
How awesome is that?

Well... building it wasn't that simple.
So for your enjoyment - here's my recipe:

We all know how awesome Docker is, and how convenient Jupyter is, and how powerful GPU-enabled Tensorflow is - all we need to do is put everything together in the cloud for easy machine learning...

Tensorflow released v1.5 today - so it's a doubly happy day.

We need an initial instance to start setting things up on.

Start with a regular instance on GCP, and add a K80 GPU to it: (not all regions have GPUs... I work with us-east1-d)

You may want to set it up as preemptible to avoid huge running costs (cuts the GPU price by ~40%).

Select Ubuntu 16.04 as your base on a 20Gb disk.

SSH into it once it's up.

To install the latest CUDA (9.1 at the time of writing this post) we need the machine to run kernel 4.4.0, but the standard 16.04 GCP has is running 4.13.0 ... This is according to the CUDA install document.
We need to downgrade from the stock GCP 4.13.0:

$ sudo apt-get install -y linux-image-4.4.0-112-generic
$ sudo apt-get purge -y linux-image-4.13.0-1008-gcp 
$ sudo update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.4.0-112-generic
Found initrd image: /boot/initrd.img-4.4.0-112-generic
done

Now reboot your machine to use the 4.4.0 kernel.
Make sure it's indeed running 4.4.0:

$ uname -r
4.4.0-112-generic

We're now ready to setup CUDA, Nvidia-Docker and a Docker image that restarts on boot and enables Jupyter password auth.
Here's a script that automates the process:

You'll need to copy-paste the script into the machine and execute it.
It will take a very long time to finish... especially the CUDA install step, and building the Docker image.

Feel free to tweak what packages you want installed in the Dockerfile, but tensorflow/tensorflow comes with SKLearn, Numpy, Pandas and all that good stuff. I added Seaborn, which is very cool for visualizations, as well as RISE for notebooks presentations (that I use to teach class), Python-OpenCV for image processing utilities and Keras.

One more thing to take care of is opening the 8888 port on the instance.
GCP is doing this via the VPC console.
Here's a SO guide: https://stackoverflow.com/questions/21065922/how-to-open-a-specific-port-such-as-9090-in-google-compute-engine

In the end your machine should be ready to go.
You can restart it, and accessing the notebook server is simply done via http://<machine_ip>:8888 as per usual. The password is as the script suggests tensorflow_gpu_jupyter, which is changeable by generating it.

Now create a Custom Image from the disk, and you can effortlessly instantiate more machines...

Enjoy!
Roy.

Share