I saw there were a number of questions, here in the community, about the neural net that forms the "brain" of each creature, some had no answers but others had very interesting answers that may have been of interest to all those asking. So I thought I'd try to gather together those and some additional information together in one place. Perhaps it could be worth a sticky or serve as the basis for an FAQ entry.
Basic introduction to Neural Nets
… about the artificial NNs used in Evolution
Keiwan has written:
The brain is a simple feed-forward neural network. Each node takes all of its inputs, multiplies each one with a weight (the weights are the things that are optimized during the simulation and they determine the behaviour), adds them all together and runs that result through a so-called activation function (1 / (1 + exp(-x)) which calculates the output of this particular node.
and in another post:
In our ANN we have multiple layers of neurons. All of the neurons between two layers are connected to all of the neurons in the respective other layer. The information flows from left to right (with respect to my visual network in the settings). The information starts out as all of the input values that describe the creature's current state (height, velocity, number of points touching the ground, rotation...). These inputs then get passed into the neural network - each of them into one neuron. This is why I wrote "One per Input" in the visual network, just to simplify it a bit. The number of inputs depends on the task and you can see the exact number of inputs neurons if you look at the stats of the best creatures.
After the information has filtered through the network it reaches the output neurons. We have one of these per muscle. Each output value controls the amount of contraction/expansion of exactly one muscle.
Mathematically the network weights are represented by matrices so most of the calculations boil down to simple (but not very efficient) matrix multiplications. The number of nodes (neurons) in two consecutive layers determines the dimensions of the weight matrix. If you have two layers with 10 nodes each, the weight matrix between those two layers is going to have the dimensions 10 x 10, so it's going to contain 100 entries. If you have two layers with 100 nodes each (which I wouldn't recommend) that would result in a 100 x 100 matrix with 10000 entries. This can have a pretty severe performance impact relatively quickly.
The matrix weights are the things that ultimately determine the behaviour of your creature. The brain is going to "make it's decisions" based on the weights between the individual nodes. The genetic algorithm therefore tries to optimize these weights to find the best possible creature behaviour. If you have a very large network, then you're going to have lots of different weights that all need to be optimized through trial and error. The more weights you have the more likely it is that they might counteract each other and the longer it's going to take for the brains to become optimized. I've never actually run a simulation with a very complicated network for long enough to see if the creatures learn to behave significantly better than the ones with a smaller brain or whether they both converge towards the same behaviour. I'm leaving that to everybody using the simulation to maybe play around with.
In response to a specific question from huepenbecker:
… I read somewhere that it makes most sense to stack your layers from big to small, e.g. 100-20-5. Can you confirm this?
I've definitely seen better results that way than having it the other way around (e.g. 5-20-100) but I've seen even better results with a more balanced node distribution. You're right, there are lots of deep neural networks, for example for pattern recognition, that are setup with decreasing weights per layer but in this case that doesn't seem to be the optimal solution. Maybe it can actually produce better results but only after a very long simulation.
I went looking for some guidance on how to optimize the settings that are available in Evolution. As a complete neophyte, with absolutely zero understanding of neural networks (outside of the introductory videos above). I found this early search result immensely helpful:
Here are some sourced excerpts that are relevant:
From Introduction to Neural Networks for Java (second edition) by Jeff Heaton
Problems that require two hidden layers are rarely encountered. However, neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer.
Using too few neurons in the hidden layers will result in something called underfitting. Underfitting occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set.
Using too many neurons in the hidden layers can result in several problems. First, too many neurons in the hidden layers may result in overfitting. Overfitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers. A second problem can occur even when the training data is sufficient. An inordinately large number of neurons in the hidden layers can increase the time it takes to train the network. The amount of training time can increase to the point that it is impossible to adequately train the neural network. Obviously, some compromise must be reached between too many and too few neurons in the hidden layers.
There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:
- The number of hidden neurons should be between the size of the input layer and the size of the output layer.
- The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
- The number of hidden neurons should be less than twice the size of the input layer.
From Practical Neural Network Recipes in C++ by Timothy Masters & Morgan Kaufmann, 1993
For a three layer network with n input and m output neurons, the hidden layer would have sqrt(n*m) neurons.
Unsurprisingly, it seems like the default 3-layer (single hidden layer) network with 10 neurons is a very reasonable choice that should work well for a wide range of creature designs … one might conclude that the developer knows exactly what he's doing!
Further reading - subject: