This is what worked best for me, in Ubuntu 16.04.

Install cuda

Install cuda using apt-get (instructions here). You will need to download a .deb package containing repository metadata which can be found in nvidia’s developer platform, and then proceed following the steps in the documentation.

Don’t forget to perform the mandatory post-installation steps. The step for configuring POWER9 is only necessary if you are using a POWER9 system (Power 9 is a high-end server CPU made by IBM).

Obs: if you installed cuda via apt-get, it may be interesting to put it on hold so that it doesn’t get updated automatically and end up breaking down your workflow unexpectedly, since tensorflow binaries are specific to a cuda version:

sudo apt-mark hold cuda

Install cudnn

Install cudnn without using a .deb package (link here). For me, using the .deb package provided by nvidia never seemed to get the right paths.

The installation is very simple: after downloaded, uncompress the .tgz file:

cd ~/Downloads/
tar -zxvf cudnn-9.1-linux-x64-v7.1.tgz

# __Output should be:__
# cuda/include/cudnn.h
# cuda/NVIDIA_SLA_cuDNN_Support.txt
# cuda/lib64/
# cuda/lib64/
# cuda/lib64/
# cuda/lib64/libcudnn_static.a

Next, there will be 2 directories inside the extracted folder: include and lib64. They correspond to the directories with the same name inside /usr/local/cuda where you should move the respective files. That is all that is needed:

sudo mv ~/Downloads/cuda/include/* /usr/local/cuda/include/
sudo mv ~/Downloads/cuda/lib64/* /usr/local/cuda/lib64/

Install TensorFlow

It’s possible to install tensorflow using Anaconda (instructions here) using individual whl files. Make sure that the wheel (.whl) file you are installing was built for the python, cuda and cudnn versions you have installed.

However, if you want maximum performance, you will want to build it from source.

Testing TensorFlow

import tensorflow as tf

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output =

Variables and Placeholders

This question describes pretty well the difference between tf.Variable() and tf.placeholder() and when each should be used.


Doc here. An important point to remember about variables is that they exist outside the context of a single call. Be sure to initialize their value by creating an initializer tf.global_variables_initializer() and running it through the session.

Variables are usually defined like so:

weights = tf.Variable(tf.truncated_normal([..., ...]))
biases = tf.Variable(tf.zeros([...]))


With placeholders, their values should be passed via a feed_dict when running the session.

Since you’ll want to feed more than one single example in the input (your batch_size is usually greater than 1), it is common to create the input placeholder like this:

input = tf.placeholder(tf.float32, [None, n_features])

which corresponds to a tensor with None (i.e., a variable number of) lines and n_features columns.

To define a scalar, create a tensor with 0 dimensions:

scalar = tf.placeholder(tf.float32, [])


The ReLU function is provided via tf.nn.relu():

# Hidden Layer with ReLU activation function
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)


The functions tf.nn.conv2d() and tf.nn.bias_add() help you create a convolutional layer:

conv_layer = tf.nn.conv2d(input, weight, strides=[...], padding='SAME|VALID')
conv_layer = tf.nn.bias_add(conv_layer, bias)
# activation function
conv_layer = tf.nn.relu(conv_layer)

tf.nn.conv2d requires the input be 4D (batch_size, height, width, depth). The difference between the two types of padding is nicely explained here.


tf.nn.max_pool() helps you apply max pooling to the convolutional layer:

conv_layer = tf.nn.max_pool(conv_layer, ksize=[...], strides=[...], padding='SAME|VALID')

ksize is the filter size. 2x2 filters with a stride of 2x2 are common in practice.


Dropout forces the network to learn redundant representations, making things more robust and preventing overfitting. Also, it makes the network act as if taking the consensus of an ensemble of networks, improving performance. See tf.nn.dropout():

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

In order to compensate for dropped units, the function automatically multiplies kept units by $\frac{1}{keep_prob}$.

Embedding Layers

tf.nn.embedding_lookup() does the job of retrieving rows automatically from an embedding layer. In the example below we create an embedding layer for the case of word representation, considering that the number of possible words (or, in general, the number of possible indexes we are going to feed into the network) is vocab_size and the number of latent factors in the embedding is embed_dim:

embedding = tf.Variable(tf.random_uniform([vocab_size, embed_dim], -1, 1))
embed = tf.nn.embedding_lookup(embedding, ids)

ids should be a tensor with type int32 or int64 containing the ids of the rows to be retrieved.

This answer in stackoverflow has a more detailed explanation.


def build_cell(num_units, keep_prob):
    lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    return drop

cell = tf.contrib.rnn.MultiRNNCell([build_cell(num_units, keep_prob) for _ in range(num_layers)])

output, state = tf.nn.dynamic_rnn(cell, embed, sequence_length=source_sequence_length, dtype=tf.float32)

Batch normalization

Using tf.layers.batch_normalization() (higher level)

First, add batch normalization to the layer (usually before the activation function). A new parameter to the layer will be necessary, indicating if it is training or not. Keep in mind that you will need to create a new placeholder for this boolean value later.

def fully_connected(prev_layer, num_units, is_training):
    layer = tf.layers.dense(prev_layer, num_units, activation=None, use_bias=False)
    layer = tf.layers.batch_normalization(layer, training=is_training)
    layer = tf.nn.relu(layer)

Then, you will need to tell tensorflow to update the population statistics while training:

with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
    train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)

Using tf.nn.batch_normalization() (lower level)

If you want to implement a lower level batch_normalization function, you will use tf.nn.batch_normalization(). See the documentation for more details. This jupyter notebook can be of help as well.

Training the network

# define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

with tf.Session() as sess:

    for epoch in range(epochs):
        for batch in range(num_examples//batch_size):
  , feed_dict={...})

            loss =, feed_dict={...})

Remember to use keep_prob = 1 when calculating the loss.

Save and load progress

tf.train.Saver lets you save any tf.Variable to your file system.

saver = tf.train.Saver()

with tf.Session() as sess:
    # do your stuff
    pass, './model.ckpt')

.ckpt stands for checkpoint.

To restore the session, do:

saver = tf.train.Saver()

with tf.Session() as sess:
    saver.restore(sess, './model.ckpt')
    # no need to call the initializer

    # do your stuff

When loading the data tensorflow uses the names it assigns to variables, so be sure to check out the next section.

Names and Scopes

Tensorflow automatically assings names to each variable. These names are used for reusing the same variables in different contexts (or scopes) and when reloading data. You are able to define the naming structure of the variables (like if it was a namespace) with the functions tf.name_scope() and tf.variable_scope (their difference is well explained in these answers: [a1], [a2]), which cause all groups of related objects to have the same naming structure. A good way to do it when defining functions to create the layers is:

def fc_layer(input, channels_in, channels_out, name='fc')
    with tf.name_scope(name):
        w = tf.Variable(tf.zeros([channels_in, channels_out]), name='W')
        b = tf.Variable(tf.zeros([channels_out]), name='B')
        return tf.nn.relu(tf.matmul(input, w) + b)

It’s possible to set names after the variable was created also, with tf.identity():

weights = tf.Variable(tf.truncated_normal([2,3]))
weights = tf.identity(weights, name='weights')

Shapes and dimensions

To get the shape as a list of ints, do tensor.get_shape().as_list().


TensorBoard is a suite of visualization tools to make debugging, optimization and understading of TF graphs easier.

To use TB we need first write data from TF to disk, using the class tf.summary.FileWriter():

with tf.Session() as sess:
    writer = tf.summary.FileWriter("/tmp/example_name/1")
    writer.add_graph(sess.graph) # this can also be passed in the above statement

And then use the command tensorboard specifying the logging directory:

$ tensorboard --logdir /tmp/example_name/1

To export condensed information about the model, you use summaries:

tf.summary.scalar('accuracy', accuracy)
tf.summary.image('input', x_image, 3)

def fc_layer(...):
    tf.summary.histogram("weights", w)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("activations", act)
    return ...

# collect summaries
with tf.Session() as sess:
    merged_summary = tf.summary.merge_all()

    for i in range(2001):
        if i % 5 ==0:
            s =, feed_dict={x: batch[0], y: batch[1]})
            writer.add_summary(s, i)

Histograms are useful when you have a bunch of numbers (like in a matrix) and you want to look at the distribution of it.

It may be an interesting idea to save different sets of summaries to disk for the varying hyperparameters you may want to experiment with:

# hyperparameter search
for learning_rate in [1e-3, 1e-4, 1e-5]:
    for num_fc_layers in [2, 3]:
        # save a different summary for each configuration
        hparam_str = make_hparam_string(learning_rate, num_fc_layers)
        writer = tf.summary.FileWriter("/tmp/example_name/" + hparam_str)

        # actually run with the new settings

A very interesting feature that TB has is the Embedding Visualizer, which lets you project high dimensional data into 3 dimensions. Code shown above was taken from here.


Lots of Tensorflow examples here.