July 24, 2023
Photo by Lukasz Szmigiel on Unsplash Introduction A random forest is an ensemble model that…
n 2015, Google established its first TPU center to power products like Google Calls, Translation, Photos, and Gmail. To make this technology accessible to all data scientists and developers, they soon after released the Cloud TPU, meant to provide an easy-to-use, scalable, and powerful cloud-based processing unit to run cutting-edge models on the cloud.
According to Google’s team behind Colab’s free TPU:
“Artificial neural networks based on the AI applications used to train the TPUs are 15 and 30 times faster than CPUs and GPUs!”
But before we jump into a comparison of TPUs vs CPUs and GPUs and an implementation, let’s define the TPU a bit more specifically.
TPU stands for Tensor Processing Unit. It consists of four independent chips. Each chip consists of two calculation cores, called Tensor Cores, which include scalar, vector and matrix units (MXUs).
In addition, each Tensor Core, with 8 GB chip memory (HBM), has been unified. Each of the 8 cores on the TPU can execute user accounts (XLA ops) independently. High-bandwidth interconnection paths allow the chips to communicate directly with each other.
XLA is an experimental JIT (Just in Time) compiler for TensorFlow backend. The most important difference and feature from CPUs (Central Processing Units) and GPUs (Graphical Processing Units) is that the TPU’s hardware is specifically designed for linear algebra, which is the building block of deep learning. This is sometimes called a matrix or tensor machine.
Now that you have a bit better idea of what the TPU actually is, let’s take a look at how it compares to other common processing units.
Using Google’s Colab TPU is fairly easy. Using Keras, let’s try several different and classic examples. And then we can evaluate the results!
Using the TensorFlow + Keras library to assess Google Colab TPU performance, we can consider two well-known datasets and basic deep learning methods:
🔵 Convolutional Neural Network (CNN) trained on the MNIST dataset
🔵 Visualization and Deploying a TPU-trained CNN (MNIST) with ML Engine
You can find all the source codes down at the end of this post, as well as a look at visualizing and deploying models!
You’ll need to make the TPU selection on Google Colab first by using the Runtime tab. Clicking on the Change runtime type will allow you to select the TPU via the Hardware accelerator drop-down menu.
First, you’ll need to create a TPU model using Keras:
tf.contrib.tpu.keras_to_tpu_model:
tpu_model = tf.contrib.tpu.keras_to_tpu_model( model, strategy=tf.contrib.tpu.TPUDistributionStrategy( tf.contrib.cluster_resolver.TPUClusterResolver( tpu='grpc://' + os.environ['COLAB_TPU_ADDR']) ) )
TPU Estimator:
Estimators should be added at TensorFlow’s model level. Standard estimators can run models on CPUs and GPUs. But you need to use tf.contrib.tpu.TPUEstimator
to train a model using the TPU.
my_tpu_estimator = tf.contrib.tpu.TPUEstimator( model_fn=my_model_fn, config=tf.contrib.tpu.RunConfig() use_tpu=False)
TPU Operation Configurations:
my_tpu_run_config = tf.contrib.tpu.RunConfig( master=master, evaluation_master=master, model_dir=FLAGS.model_dir, session_config=tf.ConfigProto( allow_soft_placement=True, log_device_placement=True), tpu_config=tf.contrib.tpu.TPUConfig(FLAGS.iterations, FLAGS.num_shards), )
TPU Optimization:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) if FLAGS.use_tpu: optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)
When learning the basics of deep learning, it’s a good idea to compare training times on a well-known dataset (MNIST, in this case) with a simple CNN model—a relatively common introductory project for beginners—with Google Colab’s GPU and TPU.
🏆 At the end of this example, you can see that every epoch takes only 3 seconds using the TPU, as compared to Google Colab’s GPU (Tesla K80), where every epoch takes 11 seconds. Turns out, our model trains roughly 4 times faster.
After the necessary library installation is completed, you’ll need to perform the following TPU addressing process:
try: device_name = os.environ['COLAB_TPU_ADDR'] TPU_ADDRESS = 'grpc://' + device_name print('Found TPU at: {}'.format(TPU_ADDRESS)) except KeyError: print('TPU not found')
When the process finishes smoothly, you should see:
Found TPU at: grpc://10.41.118.242:8470
We complete loading the MNIST dataset, separating data into training and testing, setting parameters, creating a deep learning model, and optimization methods.
Note: For this tutorial, I’ve focused solely on how to use TPU on Google Colab— these other processes, while of course important, won’t be covered in this post.
Then it’s necessary to use tf.contrib.tpu.keras_to_tpu_model
to make the model suitable for TPU usage during training.
To view the structure of the model that will run on Google Colab’s TPU:
tpu_model.summary()
When we start the training process:
To save the trained model weights:
tpu_model.save_weights('./MNIST_TPU_1024.h5', overwrite=True)
Here we can see that the accuracy and loss for training and validation sets achieve good results in a very short time. Also, 168 sample validation digits out of 10000 resulted in bad predictions and are shown in red and sorted first in the second image below.
We’ve covered a very quick implementation of what we can do with the TPU in Google Colab. In the next step, we can try to train larger models with larger datasets to fully utilize the power of the TPU.
🌈 Free access to Google Colab’s TPU has indeed opened up a whole new deep learning world to discover!
They include:
🔵Convolutional Neural Network (CNN) trained on the Fashion MNIST dataset
🔸 If a similar speed comparison is performed for Fashion MNIST, the situation will not be very different!
Each epoch takes approximately 7 seconds, and the result is only 102 seconds on for training 15 epochs with the TPU.
With the GPU it takes 196 seconds, and for the CPU, 11,164 seconds (~ 3 hours). This shows that the TPU is about 2 times faster than the GPU and 110 times faster than the CPU.
🔵 Fashion-MNIST_TPU_Egitimi.ipynb
🔵Long-Short Term Memory (LSTM) network trained on the Shakespeare dataset
🔵Visualization and Deploying a TPU-trained CNN (MNIST) with ML Engine
🔵Keras MNIST TPU end-to-end-training, saved model and online inference
Thanks to Martin Görner for letting us publish the work in Turkish! (Translation team: Başak Buluz, Yavuz Kömeçoğlu and I)
If you’d like to train ANNs using Google Colab’s TPU, here’s another extremely useful resource: TPU-speed data pipelines: tf.data.Dataset and TFRecords
I would like to thank Yavuz Kömeçoğlu for his contributions. 🙏