/pub/sci/neural/incoming/mlp.tar.Z MLP - USER'S GUIDE =============================================================================== CONTENTS : 1 : Introduction 2 : How to describe neural nets in MLP (overview) 3 : How to run MLP 4 : How to describe neural nets in MLP (detailed) 5 : Simple example 6 : More complex example ================================================================================ 1 : Introduction ---------------- MLP is a tool kit for experimenting with multilayer perceptron neural networks. It provides support for supervised training, for testing of the network, for io mapping, as well as for using the network in classification problems. MLP can handle a broad range of topologies : - both traditional synaptic layers and time delayed synaptic layers are supported - as squashing functions, the user can specify logistic functions, bipolar sigmoids or softmax non linearity's These layers can be arranged in any possible way, to allow for maximum flexibility. MLP was written based on the Rapid system, as developed by Richard Hartmann (speech lab of the Czech Technical University, Email : hartmann@feld.cvut.cz). MLP restricts the use of Rapid to neural networks, but provides a much simpler user interface, without sacrificing too much flexibility. Apart from that, support for softmax non linearity's and time delayed layers was added. ================================================================================ 2 : How to describe neural nets in MLP (overview) ------------------------------------------------- The description of a neural net is stored in an ASCII file. This is a sample of such a file, used to start the training of a network. [ mlp ] Topology = 7 3 7 SaveEnable = true SavePeriod = 5 [ synapses ] LearnRate = .1 RandMaximum = .1 [ logistic ] LearnRate = .1 [ synapses ] LearnRate = .1 RandMaximum = .1 [ logistic ] LearnRate = .1 General parameters are defined in the [mlp] section. (Note that MLP is case sensitive, but not sensitive to spaces and linefeeds.) The line "Topology = 7 3 7" means we're dealing with a network with 7 input nodes, 3 hidden nodes and 7 output nodes. [ Note that what is traditionally called one single layer in literature (i.e. a set of neurons that apply a non linear function to a weighed sum of their inputs), is in MLP terminology split up into two layers : a linear (synaptic) layer to weigh the inputs, and a non linear layer to do the squashing of the output of the linear layer. ] The next two lines mean that while training, the new network parameters will be saved after every 5 "input - desired output" pairs. In the next sections, the layout of the network is further defined. In this case, MLP expects two linear layers (one going from the 7 input nodes to the 3 hidden nodes, and one from the 3 hidden nodes to the 7 output nodes). The user is completely free to specify the type of linear layer (synaptic, synaptic with delay, ..) Before, in between and at the end of these linear layers, either zero, one or more non linear layers can be specified. Again, the user has complete control over the type of any of these layers. In this example, the network is set up to learn with a learning rate of .1, while the weights are initialised with random values between -.1 and .1 . When the training proceeds, the 'RandMaximum' lines will be replaced by the actual values of the weights and thresholds. ================================================================================ 3 : How to run MLP ------------------ MLP is invoked from the command file as follows : mlp is the description file of the network is one of the following : io : does simple io mapping, i.e. the user inputs the input vector and MLP calculates the output of the network. train : trains the network. Expected input are "input - desired output" pairs. test : uses the same input as 'train', but does not adapt the network. Calculates the energy of the difference between input and desired output. classify : same as io, but can sort the elements of the output vector, highest values first. confusion : calculates confusion matrix and classification performance of the network. Expected input are the input vectors, followed by their class index, i.e. the index of the output neuron that is supposed to have the highest value. Note that the indexing starts at zero. The way these commands work, can be specified in the [mlp] section of the network description file. See next paragraph. =============================================================================== 4 : How to describe neural nets in MLP (detailed) ------------------------------------------------- This paragraph describes the information the user can specify in the different sections of the network description file. [mlp] Topology = : informs MLP about the number of nodes in each layer. BackupFile = : if this field is specified, MLP backs up the network file before overwriting it. SaveEnable = true or SaveEnable = false : enables/disables saving. Note that this also counts for the confusion tables, who are only written on disk and not echoed to the screen. SavePeriod = : number of input/output pairs after which the network is saved during training. Setting this field to zero disables saving. SeedRand = : seed for the random number generator. InputFile = : when not set, stdin is assumed. InputFormat = ascii or InputFormat = binary : switches between ASCII and binary input. Note than when using ascii input, MLP doesn't really care if the elements of vectors are on one line or not. AppendZeros = true or AppendZeros = false : when true, MLP adds zeros to the input, in order to give the last input vector the correct size. OutputFile = : when not set, stdout is assumed. OutputFormat = ascii or OutputFormat = binary : switches between ASCII and binary output. Output = : this is used to specify the kind of output you get while training or testing the network. The fields you can specify are : - input : input value - desired : desired output value - computed : true computed output value - error : difference between the last two - distance : sum of the squares of the elements of error - tdistance : sum of distances for all the pairs seen After each pair, these fields are printed in the order specified, one field per line. If nothing is specified, MLP will just create a final report after the training is complete, with the total energy (defined as half the value of tdistance), number of patterns seen, and the energy per pattern. This final report is always created, both for training and testing. ConfTable = : file where the confusion table is to be stored. ConfMatrix = : file for the confusion matrix, which is the confusion table without any comments. CodeRange = : used by confusion. Represents the number of classes the network is supposed to recognise (in most cases this equals the number of output neurons). ShowOnlyIndex = true or ShowOnlyIndex = false : used by classify. If false, MLP will output the estimated class as well as the actual value of the particular output neuron. Note the class indexing starts at zero. N-best = : used by classify. Shows the N best possible classifications. [synapses] This is basically a matrix multiplication, to calculate the weighed inputs. Weights : # : specifies the weight-matrix. Note the colon (':') instead of the equal sign ('='). There are as many rows in the matrix as there are output neurons and as many columns as there are input neurons. However, it doesn't really matter where the user inserts line feeds. Note that the end mark ('#') must always be present. Thresholds : # : specifies the threshold vector. Once again, the end mark is obligatory. LearnRate = : specifies the learning rate. If not set, the network will not learn. RandMaximum = : this will initialise the weights and thresholds with random values between + and -. [synapsesTD] Synapses with memory, used in TDNN (time delayed neural networks) Delays = : specifies the size of the buffer, i.e. the weighing takes place for the current input as well as for the last inputs. When setting this value to zero, this layer will behave like an ordinary synaptic layer. Weights : # : this time, the weight matrix is three dimensional, or a sequence of two dimensional matrices. The first 2D matrix weighs the current input, the next one the previous input, etc. Thresholds : # : a 2D matrix instead of a vector this time. LearnRate = : specifies the learning rate. If not set, the network will not learn. RandMaximum = : this will initialise the weights and thresholds with random values between + and -. [synapsesTD2] Same as synapsesTD, but only works when the number of input nodes and the number of output nodes are equal. SynapsesTD2 only allows connections between input and output nodes with the same index. No cross connections are allowed in order to speed up computation and save memory. It uses the same fields as synapsesTD, but this time, the weight matrix is 2D again. Every column of the matrix represents a different output neuron, every row a different delay. [logistic] Logistic non linearity : y = Maximum / ( 1 + exp(-Slope.x) ) Maximum : # : the value of maximum for each of the output neurons. Defaults to 1. Does not change during training. Slope : # : the slope for each of the output neurons. Defaults to 1. If the learning rate is not zero, the slope will change during training. LearnRate = : specifies the learning rate for the slope adaptation. [bipsigmoid] Bipolar sigmoidal non linearity : y = 2 . Maximum / ( 1 + exp(-Slope.x) - Maximum Uses the same fields as [logistic]. [softmax] Softmax non linearity : y[i] = exp(x[i]) / SUM (exp(x[j], j) Note that softmax normalises the output so that the sum of the output elements is one. Does not use any fields, only the header. ================================================================================ 5 : Simple example ------------------ As an example, lets create a network with 7 input nodes, 3 hidden nodes and 7 output nodes. This network will accept inputs in a unipolar 1 out of 7 code (i.e. all the inputs are 0, except one element that is 1), and should be able to reproduce the input on its output. Basically, the network should realise some binary coding of the 7 input patterns on its 3 hidden neurons. To start the training, we first create the following file '737.net' : [ mlp ] Topology = 7 3 7 SaveEnable = true SavePeriod = 5 [ synapses ] LearnRate = .1 RandMaximum = .1 [ logistic ] LearnRate = .1 [ synapses ] LearnRate = .1 RandMaximum = .1 [ logistic ] LearnRate = .1 We can store the training set in a file '737.train.in' : 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 To train the network on this set, we use : mlp train 737.net < 737.train.in Alternatively, we could omit the redirection and use the InputFile field in the [mlp] section instead. The output looks something like this : >Training complete. > >Total energy : 5.96117 >Patterns seen : 7 >Energy per pattern : 0.851596 One iteration will obviously not be enough, so we can use a batch file to keep on training the network. In the korn shell, one would write : i=0 while [ $i != 1000 ] do let i=i+1 mlp train 737.net < 737.train.in done You should see the energy gradually decrease as the training proceeds. In the event that the training gets stuck, try re-initialising the network with different random weights and/or adjust the value of RandMaximum. After training, we can test the network on the same training set, but this time we also want to see the distance for each of the individual patterns. After adding 'Output = distance' to the [mlp] section, the command mlp test 737.net < 737.train.in produces something like this : >0.0283211 >0.060428 >0.707059 >0.0597712 >0.0343359 >0.0193406 >0.0388482 > >Testing complete. > >Total energy : 0.474052 >Patterns seen : 7 >Energy per pattern : 0.0677218 If we want to see what the output of the neural net is, given a particular input, we use : mlp io 737.net Typing "1 0 0 0 0 0 0" (without quotes) results in : >0.896266 0.0676598 0.0918753 6.12737e-05 0.0673902 5.70481e-05 8.47166e-05 If we want to see the two best classifications, we add "N-best = 2" to the [mlp] section and issue the command mlp classify 737.net The input "1 0 0 0 0 0 0" produces : >0 2 0.896266 0.0918753 If we would have set ShowOnlyIndex to true, the last two columns would not have been displayed. Finally, we want to get an idea of the overall classification performance of the network. We use the following input file (737.conf.in) consisting of the input patterns and their correct classification : 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 2 0 0 0 1 0 0 0 3 0 0 0 0 1 0 0 4 0 0 0 0 0 1 0 5 0 0 0 0 0 0 1 6 Again, spaces are not important. In the [mlp] section, we need to specify the files names for storing the confusion table and the confusion matrix, as well as the code range (which in this case is 7). After mlp confusion 737.net < 737.conf.in the confusion table file contains : file: 737.net desired: 0_ 1_ 2_ 3_ 4_ 5_ 6_ tested: 0_: 1# 0 0 0 0 0 0 1_: 0 1# 0 0 0 0 0 2_: 0 0 1# 0 0 0 0 3_: 0 0 0 1# 0 0 0 4_: 0 0 0 0 1# 0 0 5_: 0 0 0 0 0 1# 0 6_: 0 0 0 0 0 0 1# total: 1 1 1 1 1 1 1 total[%]: 14.3 14.3 14.3 14.3 14.3 14.3 14.3 correct[%]: 100.0 100.0 100.0 100.0 100.0 100.0 100.0 **** N-best: 1 performance[%]: 100.0 An acceptable performance ... ================================================================================ 6 : More complex example ------------------------ For a more complex example, witch involves interfacing MLP with other problem specific programs, see /users/pbienst/lblr/lblr.txt. Starting from a speech database with known phonetic transcription, but unknown transition times between the phonemes, a neural net is created to try to find these transitions. Once the database is completely labelled on frame level, it can be used to train another neural network to do the actual speech recognition.