This article is a follow up to this article; I suggest you check it out before moving on.

An article with a purpose

Deep Learning is, without a doubt, an interesting and fascinating topic. It can be applied in many different contexts, from image recognition to text processing; companies like Google and Facebook heavily rely on Deep Learning to offer many of their services.

However, we are not going to understand how DL can integrate inside business logics; that would be constructive and useful, ergo not suited for this blog.

Instead, we are going to take a deeper look at Google Deep Dream, aiming to create this hypnotic images by ourself. Please note that, jokes aside, this is indeed useful, allowing us to better undestand the processes that take place during the classification procedure.

Tools

Once again, we are going to use Keras on top of TensorFlow, to mantain the code readable and to avoid complications.

We are going to implement our own Deep Dream convnet using the pre-trained weights we have already used last time.

The code, however, will be slightly different and we are not reusing the one we wrote last time.

The coding bit

First of, we start with a few utilities:

from __future__ import print_function
from keras.preprocessing.image import load_img, img_to_array
import numpy as np
from scipy.misc import imsave
from scipy.optimize import fmin_l_bfgs_b
import time
import argparse

from keras.applications import vgg16
from keras import backend as K
from keras.layers import Input

parser = argparse.ArgumentParser(description='Deep dream implelentation with TensorFlow')
parser.add_argument('base_image_path', metavar='base', type=str,
                    help='Path to the image to transform.')
parser.add_argument('result_prefix', metavar='res_prefix', type=str,
                    help='Prefix for the saved results.')

args = parser.parse_args()
base_image_path = args.base_image_path
result_prefix = args.result_prefix

N_ITER = 100

# dimensions of the generated picture.
img_width = 450
img_height = 900

# path to the model weights file.
weights_path = 'vgg16_weights.h5'

These utilities include:

  • import statements

  • Argument handling: we expect two parameters, the starting image and a prefix to save the output images

  • The maximum number of iterations to perform (N_ITER)

  • Dimensions for the generated images

  • The weights to use for our network (VGG16)

Loading the model is, like last time, pretty straightforward:

# this will contain our generated image
dream = Input(batch_shape=(1,) + img_size)

# build the VGG16 network with our placeholder
# the model will be loaded with pre-trained ImageNet weights
model = vgg16.VGG16(input_tensor=dream,
                    weights='imagenet', include_top=False)
print('Model loaded.')

layer_dict = dict([(layer.name, layer) for layer in model.layers])

We create a placeholder and we use it as the input_tensor for our network.

The img_size variable is computed this way:

if K.image_dim_ordering() == 'th':
    img_size = (3, img_width, img_height)
else:
    img_size = (img_width, img_height, 3)

The dim_ordering can either be “tf” or “th”. It tells Keras whether to use Theano or TensorFlow dimension ordering for inputs/kernels/ouputs.

We can now write our loss function:

# define the loss
loss = K.variable(0.)
for layer_name in settings['features']:
    # add the L2 norm of the features of a layer to the loss
    assert layer_name in layer_dict.keys(), 'Layer ' + layer_name + ' not found in model.'
    coeff = settings['features'][layer_name]
    x = layer_dict[layer_name].output
    shape = layer_dict[layer_name].output_shape
    # we avoid border artifacts by only involving non-border pixels in the loss
    if K.image_dim_ordering() == 'th':
        loss -= coeff * K.sum(K.square(x[:, :, 2: shape[2] - 2, 2: shape[3] - 2])) / np.prod(shape[1:])
    else:
        loss -= coeff * K.sum(K.square(x[:, 2: shape[1] - 2, 2: shape[2] - 2, :])) / np.prod(shape[1:])

Next step is applying a couple of tweaks to achieve better results:

  • A continuity loss, to give the image local coherence and avoid messy blurs

  • The L2 norm loss in order to prevent pixels from taking very high values

Two lines of code will suffice:

# add continuity loss
loss += settings['continuity'] * continuity_loss(dream) / np.prod(img_size)
# add image L2 norm to loss
loss += settings['dream_l2'] * K.sum(K.square(dream)) / np.prod(img_size)

where continuity_loss is a utility function defined by us:

# continuity loss util function
def continuity_loss(x):
    assert K.ndim(x) == 4
    if K.image_dim_ordering() == 'th':
        a = K.square(x[:, :, :img_width - 1, :img_height - 1] -
                     x[:, :, 1:, :img_height - 1])
        b = K.square(x[:, :, :img_width - 1, :img_height - 1] -
                     x[:, :, :img_width - 1, 1:])
    else:
        a = K.square(x[:, :img_width - 1, :img_height-1, :] -
                     x[:, 1:, :img_height - 1, :])
        b = K.square(x[:, :img_width - 1, :img_height-1, :] -
                     x[:, :img_width - 1, 1:, :])
    return K.sum(K.pow(a + b, 1.25))

Our model is now complete. You can feel free to further modify the loss as you see fit, to achieve new effects.

Now things can get a little bit trickier: we need to evaluate our loss and our gradients in one pass, but scipy.optimize requires separate functions for loss and gradients, and computing them separately would be inefficient. To solve this we create our own Evaluator:

class Evaluator(object):
    def __init__(self):
        self.loss_value = None
        self.grad_values = None

    def loss(self, x):
        assert self.loss_value is None
        loss_value, grad_values = eval_loss_and_grads(x)
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value

    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values

evaluator = Evaluator()

The eval_loss_and_grads functions will then read as follow:

# compute the gradients of the dream wrt the loss
grads = K.gradients(loss, dream)

outputs = [loss]
if type(grads) in {list, tuple}:
    outputs += grads
else:
    outputs.append(grads)

f_outputs = K.function([dream], outputs)
def eval_loss_and_grads(x):
    x = x.reshape((1,) + img_size)
    outs = f_outputs([x])
    loss_value = outs[0]
    if len(outs[1:]) == 1:
        grad_values = outs[1].flatten().astype('float64')
    else:
        grad_values = np.array(outs[1:]).flatten().astype('float64')
    return loss_value, grad_values

All that’s left now is to run L-BFGS optimizer over the pixels of the generated image, in order to minimize the loss:

x = preprocess_image(base_image_path)
for i in range(N_ITER):
    print('Start of iteration', i)
    start_time = time.time()

    # add a random jitter to the initial image. This will be reverted at decoding time
    random_jitter = (settings['jitter'] * 2) * (np.random.random(img_size) - 0.5)
    x += random_jitter

    # run L-BFGS for 7 steps
    x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x.flatten(),
                                     fprime=evaluator.grads, maxfun=7)
    print('Current loss value:', min_val)
    # decode the dream and save it
    x = x.reshape(img_size)
    x -= random_jitter
    img = deprocess_image(np.copy(x))
    fname = result_prefix + '_at_iteration_%d.png' % i
    imsave(fname, img)
    end_time = time.time()
    print('Image saved as', fname)
    print('Iteration %d completed in %ds' % (i, end_time - start_time))

What are we missing? Just a couple of utility functions and a custom configuration (the settings variable) for our network:

  • Here are the functions preprocess_image and deprocess_image:
# util function to open, resize and format pictures into appropriate tensors
def preprocess_image(image_path):
    img = load_img(image_path, target_size=(img_width, img_height))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg16.preprocess_input(img)
    return img

# util function to convert a tensor into a valid image
def deprocess_image(x):
    if K.image_dim_ordering() == 'th':
        x = x.reshape((3, img_width, img_height))
        x = x.transpose((1, 2, 0))
    else:
        x = x.reshape((img_width, img_height, 3))
    # Remove zero-center by mean pixel
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')
    return x
  • Next up, a couple of example settings that I like:
saved_settings = {
    'acid': {'features': {'block4_conv1': 0.05,
                              'block4_conv2': 0.01,
                              'block4_conv3': 0.01},
                 'continuity': 0.1,
                 'dream_l2': 0.8,
                 'jitter': 5},
    'doggos': {'features': {'block5_conv1': 0.05,
                            'block5_conv2': 0.02},
               'continuity': 0.1,
               'dream_l2': 0.02,
               'jitter': 0},
}

# the settings we will use in this experiment
settings = saved_settings['doggos']

The results that I’ll show below are obtained with the doggos setting.

The fun part

To test our model I’ve chosen famous paintings and I’ve run the code above for an “appropriate” number of iterations. I usually prefer images where you can recognize both the original painting and the network’s work.

Input 1: Nascita di Venere

The Birth if Venus is a painting by Sandro Botticelli generally thought to have been painted in the mid 1480s. This is an iconic and easily recognizable painting:

Here are the results after various number of iterations:

  • After 1 iteration

The painting is clearly there, almost untouched, but you can already see that weird images are forming.

  • After 5 iterations

  • After 10 iterations

Here the painting has been heavily modified, and you can recognize animals that the network was trained to recognize.

  • After 20 iterations

  • And finally, after 25 iterations

This is the point where I like it the most, but it’s a matter of personal test and you can perform as many iterations as you like.

Input 2: Creazione di Adamo

The Creation of Adam is a fresco painting by Michelangelo, which forms part of the Sistine Chapel’s ceiling, painted c. 1508–1512. Yet another very famous panting:

Let’s see how our network will deface this painting:

  • After 1 iteration:

  • After 20 iterations:

  • After 40 iterations:

  • After 60 iterations:

Other tests

I’ve tested the code on many different paintings and photos, I will just include here the two that I liked the most:

My bet is you will recognize this paintings.

Summary

Deep Dream help us understand and visualize how neural networks are able to carry out difficult classification tasks, improve network architecture, and check what the network has learned during training. It also makes us wonder whether neural networks could become a tool for artists, a new way to remix visual concepts, or perhaps even shed a little light on the roots of the creative process in general.

comments powered by Disqus