Training Neural Networks With Keras

What is Keras, and why is it useful?

Keras is a deep learning framework for Python for building neural networks and training them on datasets. It can leverage GPUs as well as CPUs for running the training algorithms. We will see how to define a dataset and create a neural network to classify it in real-time.

On Keras, you can define tensors, multi-dimensional data, which you can place at the input layer, the output layer, or the hidden layers. Tensors are a mathematical way of representing arbitrarily sized data without having to resort to complex computer science structures such as arrays. Keras defines methods to define the tensors to use as input and intermediate layers.

How to create layers

Here's how you set up and import Keras in your environment:

These submodules which have just been imported provide the support for creating models, layers, and optimizers (more on optimizers later).

There are two ways to add the layers to the model at this point. One is using what's called sequential layers using models.Sequential(), so-called because this class adds each layer sequentially using a certain add() method inside the returned model:

The second and third lines make two dense layers with ReLU and Softmax activation functions, respectively.

Take a closer look at the second line. It is explicitly defining an input shape of a 784-element vector. It is necessary to provide the input shape of the first hidden layer you add because this will be the size of the input layer.

The input layer has a dimensionality of (784,None), in order words, a vector of 784 elements, which is then passed to the first hidden layer that reduces the size to 32 elements, which is then passed to the second layer that further reduces the output size to 10 elements or dimensionality of (10,None). The last layer is technically the output layer.

The second way of adding layers is flexible. It is done by directly initializing a Model() class and creating the layers from scratch, allowing you to create directed acrylic graphs of layers. The following code creates the same neural network model as the first one:

Optimizers

Earlier in the article, I briefly mentioned optimizers. These structures try to improve the model's accuracy as much as possible by adjusting the weight values, using a single parameter called the learning rate. The learning rate is a decimal number between 0 and 1, and it controls how quickly the weights are updated. It is important to select a good learning rate so that the model does not lose too much accuracy in the solution due to a large learning rate, and so the model doesn't take too long as a result of a small learning rate.

The following image from algorithmia.com depicts the learning rate very well:

The goal of the optimization functions is to reduce the change in loss (alternatively, you can call it the mathematical derivative of the model loss) of the model to a minimum, which is an indicator that refers to how badly the model predicts data. As you can see, the graph of model loss is parabolic, so that means as the rate of change goes further from the minimum change in loss/weights, the model gets more unstable.

In this particular model, we compile the model using the RMSProp optimizer with a learning rate of 0.001, the mse - the mean squared error - function to calculate the loss, and we instruct Keras to track the model's accuracy.

Finally, we can run the model on training data to get the result:

This process is called fitting the data on the model from the input tensor to the target tensor. Two other noteworthy parameters are the batch size, which dictates the number of samples from inside the input tensor we train with at once, and the epoch, which is the number of times to train all of the samples in the tensor. These will be covered in more detail in another article.

Thanks for reading. There are many more features and examples of Keras that I want to show you, so stay tuned.

10