This week, Google announced Cloud TPU beta availability on the Google Cloud Platform (GCP), accessible through their Compute Engine infrastructure-as-a-service. Using the second generation of Google’s tensor processing units (TPUs), the standard Cloud TPU configuration remains four custom ASICs and 64 GB of HBM2 on a single board, intended for accelerating TensorFlow-based machine learning workloads. With a leased Google Compute Engine VM, Cloud TPU resources can be used alongside current Google Cloud Platform CPU and GPU offerings.
First revealed at Google I/O 2016, the original TPU was a PCIe-based accelerator designed for inference workloads, and for the most part, the TPUv1 was used internally. This past summer, Google announced the inference and training oriented successor, the TPUv2, outlining plans to incorporate it into their cloud services. Both were detailed later at Hot Chips 2017 technical presentations.
Under the hood, the TPUv2 features a number of changes. Briefly recapping, the second generation TPU ASIC comes with a dual ‘core’ configuration, each having a scalar/vector unit and 128x128 mixed multiply unit capable of 32-bit floating point operations, as opposed to TPUv1’s single core 256x256 MXU and 8-bit integer capability. TPUv2 also improves on the memory bandwidth bottlenecks of its predecessor by using HBM instead of DDR3, with 8GB HBM2 connected to each core for a total of 16 GB per chip.
Four of these ASICs form a single Cloud TPU board, ultimately with Google citing up to 180 TFLOPS of unspecified compute performance. As announced earlier, Google is targeting a ‘TPU pod’ setup as one of the end goals, where 64 Cloud TPUs are combined in a dedicated networked array of racks. Google is aiming to offer full TPU Pods on GCP later this year.
In practical terms, this capability is catered to developers looking for TPU-suitable machine learning performance for particular TensorFlow workloads, with the benefit of Google’s existing cloud infrastructure-as-a-service offerings. Given that it is a beta, Google has a number of documents and tools up on their site. In many ways, the current TPU capabilities exist as a development pipe-cleaner of sorts for the upcoming TPU pods, with Google alluding to the same thing in their announcement. A number of capabilities are yet to be ironed out for TPUs: for example, model parallelism is not yet supported, not all built-in TensorFlow ops are available, and specific limitations exist for training reinforcement learning models, recurrent neural networks (RNN), or generative adversarial networks (GAN).
While select partners have had access to Cloud TPUs for production use, today’s announcement opens the availability to general GCP customers. Google’s Cloud TPUs are available today as purchasable compute time in the US from a Compute Engine provisioned VM, with a $6.50 per TPU per hour rate charged in one-second increments. Interested parties may submit a beta quota request.