Neural style transfer using Keras
I explore style transfer in this notebook using deep learning. This is one of my favorite applications of deep learning - this was interesting at first because the output is so visual and then even more after learning the technique. Here are some output images in this Github repository.
{% include gallery id=“gallery” layout=“half” caption=“Images from here with a different style. More styled outputs here” %}
The goal is to take two images, one with a distinct “style” and another with the desired “content” and combine so the style of the first image is transferred to the latter. The way this approach of style transfer works is just like most deep learning approaches - specify a loss function and use a neural network to reduce this loss function. In this case, the loss comprises two major parts,
- Style loss - by minimizing this, the neural net learns to get closer to the style.
- Content loss - this loss ensures the neural net learns not to lose a lot of content.
Using the content image as the starting point, the neural network slowly starts to reduce the combination of the above losses to generate some fascinating outputs.
To define these losses, the intermediate layers of a CNN are used. The first layers of a trained model learn basic patterns like lines and curves and as we go deeper, the layers learn more complex patterns like squares and then even more like faces etc. Hence the output of these layers are used to calculate the loss functions. More information about all of this can be found in the resources below.
Most of the code is from this notebook. I first read this blog post and then read the original paper. I haven’t done the FastAI course yet but this lesson has some relevant material.
Getting some images
Loss functions
The VGG19 model is used to calculate the loss functions.
Style loss
The gram matrix of activations of multiple layers of the CNN captures the correlation within these layers while losing the spatial information. This gram matrix for a set of low/high layers for both images is used to calculate the style loss.
Content loss
The content loss is calculated using the activation of one of the later layers of the CNN which captures more complex patterns.
Variation Loss
This loss ensures the images are too pixelated by using the difference between adjacent pixels in the loss function.
I modified the code a little bit to make it easy to plug in any loss function for easier experimentation.
Here’s the function that takes the two images and a loss function to perform the style transfer.
Here I am finally running the functions above to see if everything is working as expected.
Looks like there is no issue losing content but the neural net is not able to capture style as much as I want it to. Below is an identical loss function from before with more weightage given to the style.
Reflections
- Based on my experiments, it is best to pick “content” images that will still look good despite some loss in detail and pick “style” images with distince colors/texture/style.
- Currently the neural network learns each style image each time - this can be improved by teaching a convnet a specific style over multiple cycles and then using that model to quickly copy over the style to fresh content images.
- The loss function is a clear place to tweak to try and produce better output images. The Wasserstein distance seems to produce better results based on this.
- Using other layers in the loss functions should also produce different results