33 Neural networks

Machine learning comes in many forms. Whether you want to call them artificial intelligence or just models is a matter of taste. Hidden Markov models are one class of models, and neural networks are another.

At this point, you already know the basics of neural networks, especially feed-forward networks, and how we can train such a network of sigmoid neurons to work as a classifier. In this exercise, you will play with neural networks and explore how features of input data and hidden layers affect the properties and the neural network’s performance.

Follow the above link into the Neural Network playground. Start by reading the text below the dashboard and look over the dashboard’s components. At the top, there is a panel with dropdown menus controlling the properties of the network and selecting the type of problem to solve. We will start with an activation type of “TanH” (similar to sigmoid activation) and a problem type of “Classification” (already set if you have not used the site before). You can pick the data set to work on in the “Data” column on the left side. Select the top left one (blue dots surrounded by orange ones), and leave the other controls as they are.

You see the network layout in the middle, with the input layer on the left and the output layer on the right. The data is a set of colored points in a two-dimensional coordinate system. The points are split into a training set and a test set. In each epoch of the training, the model parameters are changed, and the resulting change in performance is evaluated by running the model on the test data. The points included in the training data are the ones you see on the right over the output. There is a tick box in the output panel that lets you see the test data too. If you drag the “Data” slider “Ratio of training to test data” to the left, you can see how the training data set shrinks as you include more data in your test set. Leave it at 50%.

The neurons in the input layer fire if the data shows a particular feature. In the case of feature X1, the neuron produces output when the data exhibits a gradient of orange points on the left and blue points on the right. Other features trigger the other input neurons, as their thumbnails show.

Exercise 33-1

Trim the network down to a single hidden layer (-1 layer) with just two neurons, and make sure only the X1 feature is selected in the input layer. In the output on the right, the color of each point shows which class it belongs to. The background color represents the classification made by your neural network.

Press play to start the training. Pay close attention to how the weights and outputs change from their initial states. Run the network for 100-200 epochs, then stop and inspect it.

The two neurons in the hidden layer should produce output for different features - try hovering over them and think about how the single feature creates these activations.
Do the weights or the biases differ the most? You can hover over the lines/points or look at their color.
What is the Test Loss noted under output?
Do you think adding more features, neurons, or layers will improve the classification?

Exercise 33-2

We may need more features. Try adding the input node capturing the X2 feature and run this model for 200 epochs.

Is the Test Loss better now (Test Loss is the loss function of the test data).
The Output prediction is quite different now; look at the neurons and consider how adding the X2 Feature changed the other neuron outputs.
It seems to have quite a different fit - will adding more neurons make it even better?

Exercise 33-3

Two neurons may not allow for enough flexibility. Add more neurons to the hidden layer and run the model for 200 epochs.

How many neurons must you add to gain Test Loss under > 0.005?

Exercise 33-4

Now lets see how good you are:

Challenge: How low can you go in total Neurons and Features and still produce a good fit (Test Loss under 0.005)? Consider which features work well with the data especially.
Challenge: Try creating a model that performs poorly despite having at least three features and four neurons (poorly being Test Loss over 0.1).

Exercise 33-5

The ring-shaped classification problem is relatively easy (for neural networks). Now, it is time to challenge it with a more complex problem. Pick the spiral pattern (bottom left option under Data). This pattern is much more complicated, so a Test Loss below 0.05 (rather than 0.005) is good. Using more Features and Neurons will also take longer for each epoch, so you need to be more patient. It also requires more Epochs to train, so wait for 500-1000 Epochs before deciding how well the network performs. While you wait, you can watch the Test Loss graph in the output panel.

Try fitting a model using all features, and decide how many neurons and layers to use. How many neurons do you need to do a good fit? Hint: You may need more than one hidden layer.
Look at the neurons in the last layer - does any of the neurons reflect the spiral of the data well?
Try removing some of the features the network uses the least - does it still perform well?
Summarize the changes you had to make to the network to make it fit the spiral data. Do these changes capture the non-linear pattern in the data?

Exercise 33-6

Let’s wrap up with a different kind of problem - regression. In contrast to classification, we do not try to predict a discrete class but rather a specific numerical value. At the top left, click on problem type and choose “Regression”. Of the datasets, pick the more difficult dataset to the right, listed as “Multi Gaussian”. Include only the X1 and X2 features, but try different network architectures. Let the training run between 200 and 500 Epochs when testing your network.

Can you get the Test Loss below 0.005 with only a single hidden layer?
Does adding more hidden layers improve the model?
Which additional features enhance the model the most?
Does the addition of some features only improve models with more than one hidden layer?