Using GPU in TensorFlow Model – Single & Multiple GPUs

1. TensorFlow GPU

In our last TensorFlow tutorial, we studied Embeddings in TensorFlow. Today, in this TensorFlow Tutorial, we will look at “Using GPU in TensorFlow Model”. We’ll study how to increase our computational workspace by making room for Tensorflow GPU. Moreover, we will see device placement logging and manual device placement in TensorFlow GPU. In addition, we will discuss optimizing GPU memory. Also, we will cover single GPU in multiple GPU systems & use multiple GPU in TensorFlow, also TensorFlow multiple GPU examples.
So, let’s start using GPU in TensorFlow Model.

GPU in TensorFlow

Using GPU with TensorFlow model | Single & Multiple GPUs

2. GPU in TensorFlow

Your usual system may comprise of multiple devices for computation and as you already know TensorFlow, supports both CPU and GPU, which we represent as strings. For example:

  • If you have a CPU, it might be addressed as “/cpu:0”.
  • TensorFlow GPU strings have index starting from zero. Therefore, to specify the first GPU, you should write “/device:GPU:0”.
  • Similarly, the second GPU is “/device:GPU:1”.

By default, if your system has both a CPU and a GPU, the priority you give to the GPU in TensorFlow.
Let’s discuss TensorFlow Performance Optimization | Optimize GPU & CPU

3. Device Placement Logging

You can find out which devices handle the particular operations by creating a session where the log_device_placementconfiguration option is preset.

# Graph creation.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operation.
print(sess.run(c))

The output of TensorFlow GPU device placement logging shown as below:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/device:GPU:0
a: /job:localhost/replica:0/task:0/device:GPU:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]
Read Distributed TensorFlow | TensorFlow Clustering

4. Manual Device Placement

At times you may want to decide on which device your operation should be running and you can do this by creating a context with tf.device wherein you assign the specific device, i.e., CPU or a GPU that should do the computation, as shown below.

# Graph Creation.
with tf.device('/cpu:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operation.
print(sess.run(c))

The above code of TensorFlow GPU assigns the constants a and b to cpu:0. In the second part of the code, since there is no explicit declaration of which device is to perform the task, a GPU by default is chosen if available and it copies the multi-dimensional arrays between devices if required.
Let’s discuss TensorFlow Image Recognition Using – Python & C++

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/device:GPU:0
[[ 22.  28.]
 [ 49.  64.]]

5. Optimizing TensorFlow GPU Memory

Memory fragmentation is done to optimize memory resources by mapping almost all of the TensorFlow GPUs memory that is visible to the processor, thus saving a lot of potential resources.
TensorFlow GPU offers two configuration options to control the allocation of a subset of memory if and when required by the processor to save memory and these TensorFlow GPU optimizations are described below:
allow_growth, which allocates a limited amount of GPU memory in TensorFlow according to the runtime: it is dynamic in the sense that it initially allocates little memory and keeps widening it according to the running sessions, thus extending the GPU memory required by the process. The memory isn’t released as it will lead to fragmentation which is not desired. ConfigProto is used for this purpose:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

per_process_gpu_memory_fraction, is the second choice and it decides the segment of the total memory should be allocated for each GPU in use. Given below is an example which will tell tensorflow to allocate 40% of the memory:
Have a look at TensorBoard: TensorFlow Visualization Tool

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

It will use only in cases where you already know the specifics of the computation and are sure that they will not change during the course of processing.

6. Single GPU in Multi-GPU System

In multi TensorFlow GPU systems, the device with the lowest identity is selected by default. It is again to the user to decide the specific GPU if the default user does not need one:

# Creates a graph.
with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

The InvalidArgumentError is obtained when the TensorFlow GPU specified by the user does not exist as shown below:

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/device:GPU:2'
   [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
   values: 1 2 3...>, _device="/device:GPU:2"]()]]

If you want to specify the default device in such cases when there is no existing or supported device found by TensoFflow, you could use allow_soft_placement and set it in the configuration option when the session is created as illustrated by the code below.
Let’s explore TensorFlow Mobile | TensorFlow Lite: A Learning Solution

with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Running the operation.
print(sess.run(c))

7. Using Multiple GPU in TensorFlow

You are already aware of the towers in TensorFlow and each tower we can assign to a GPU, making a multi tower structural model for working with TensorFlow multiple GPUs. Let’s see an example – 

c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Running the operations.
print(sess.run(sum))

The output of TensorFlow GPU is as follows:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
 [ 98.  128.]]
You can test this multiple GPU model with a simple dataset such as CIFAR10 to experiment and understand working with GPUs.
Learn TensorFlow Linear Model Using Kernel Methods
So, this was all about how to use GPU in TensorFlow. Hope you like our explanation.

8. Conclusion

Hence, in this GPU in TensorFlow tutorial, we saw TensorFlow GPUs for graphical computations and that define as an array of parallel processors working together to perform high-level computations which are in contrast to CPUs. This TensorFlow GPU tutorial briefed you about how to initialize GPUs, change the default configurations to suit your needs and optimize your computation. Moreover, we saw how to import GPU and TensorFlow GPU install. Also, we looked at TensorFlow cannot find GPU & TensorFlow disable GPU. Furthermore, if you have any query regarding GPU in TensorFlow Model, feel free to ask through the comment section.
See also- Mandelbrot Set Compute Quickly Using TensorFlow
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.