###################################################################### ###################################################################### ## ## ## ## ## statnet ## ## ## ## a feedforward interpolation neural network ## ## ## ## by ## ## ## ## Coryn Bailer-Jones ## ## ## ## 07/12/98 ## ## ## ## email: calj at mpia-hd.mpg.de ## ## www: http://www.mpia-hd.mpg.de/homes/calj/ ## ## ## ## ## ## see the README file for disclaimer and warranty ## ## see the statnet_manual file for operational details ## ## ## ## This file is copyright 1998,1999 by C.A.L. Bailer-Jones ## ## ## ## ## ###################################################################### ###################################################################### FILE: statnet_manual DESCRIPTION: statnet operations manual AUTHOR: Coryn Bailer-Jones LAST MOD DATE: 01.02.00 ###################################################################### ###################################################################### ## ## ## statnet operations manual ## ## ## ###################################################################### ###################################################################### This file provides the information required to use and understand the statnet software. It assumes an understanding of the principles feedforward neural networks and committees. Contents -------- 1. What statnet is 2. Other sources of information 3. The statnet files 4. How to run statnet 5. The specfile 6. File formats and screen output 7. A test problem 8. Tips on using statnet 9. Modifications since the previous release version ###################################################################### # what statnet is # ###################################################################### statnet is a feedforward neural network for implementing non-linear static functions. It is trained by backpropagation. statnet uses the following: o choice between one or two hidden layers o tanh input-hidden transfer function(s), linear hidden-output function o sum-of-squares error measure statnet incorporates the following optional features: o feature selection (i.e. selection of subsets of input and output variables from the data files) o mean and variance scaling of input and output variables o conjugate gradient or gradient descent optimizers o ability to have undefined (missing) targets in training data file ###################################################################### # Other sources of information # ###################################################################### Feedforward neural networks and the back propagation training algorithm have been described in great detail in a variety of sources and this document does not explain their principles. For information on the application of a similar (but more primitive) program to this to the classification of stellar spectra, see "Automated classification of stellar spectra - II. Two-dimensional classification with neural networks and principal components analysis", by Bailer-Jones, Irwin & von Hippel 1998, Monthly Notices of the Royal Astronomical Society, 298, 361-377 (http://www.mpia-hd.mpg.de/stars/calj/p1.html). This paper also explains the basic principles of feedforward networks. The source code for statnet is reasonably well documented if you want to get into the nitty-gritty details. The conjugate gradient optimizer "macopt" has its own source of documentation, which can be obtained from http://wol.ra.phy.cam.ac.uk/mackay/c/macopt.html Further enquiries can be addressed to the author at the email address at the head of this file. Related publications and other information will be available from the statnet web page http://www.mpia-hd.mpg.de/homes/calj/statnet.html Also check this web page for more recent releases of the code or of this manual. Please mail any bug reports/suggestions for improvements to me at the above email address. ###################################################################### # The statnet files # ###################################################################### Makefile the makefile README disclaimer and warranty. Please read this before you proceed. ansi/ a directory which contains macopt (David MacKay's conjugate gradient optimizer) and its ancillary files. This is the standard macopt distribution and has not been modified in any way. netsubs.c collection of subroutines netsubs.h header file for netsubs ran1.c random number generator statnet the statnet executable (Sun Solaris) statnet.c main statnet program statnet.h header file for statnet statnet_manual this file test.err test error file (from running statnet on test.spec) test.spec the spec file for the test problem test.wt test weights file (from running statnet on test.spec) testa.in test input data file testb.in test input data file procres.pl A perl program for interogating the output (.ot) files. Eternally developmental, i.e. must be edited by hand to get it to do what you want. ###################################################################### # How to run statnet # ###################################################################### statnet is written in ANSI C and will work (i.e. has been tested) under SunOs 5.5. The executable with the release version is SunOs binary, complied using gcc. To get it to run under any other flavour of UNIX you'll need to compile it. From the statnet directory (in which you find this file) do the following: cd ansi/ make test_macII cd ../ make statnet The make will give a few warnings, but these can be ignored. You may need to adjust the makefile for your particular operating system. Note that in one place statnet uses a UNIX specific command via the C function "system" (although this only prints the date so it's hardly crucial - remove the line from the source code if it causes problems). Most memory allocation is done dynamically, but a few things have fixed maximum sizes. You are therefore advised to check in the statnet.h file that these arrays will be sufficiently large for your purposes. To run statnet, type statnet specfile where specfile is the only command line argument which statnet reads. Typing "statnet -v" will give the version number only. ###################################################################### # The specfile # ###################################################################### The specfile contains all of the relevant information on the committee and network architecture, the data files and the training method. specfiles should be given the ".spec" suffix. The specfile is read by searching for strings, such as "train_network?_(yes/no)", therefore these strings must not be changed or they wil be ignored. The relevant input for each string must be the next item on the same line, e.g. train_network?_(yes/no) yes The items in parentheses "(yes/no)" indicate the choices available. Note that any line in the specfile preceded by a "#" symbol will be ignored. Inappropriate inputs which disobey the required value will be flagged as errors by statnet. The only exception is when real values are specified instead of integers, in which case only the integer part of the number will be used (ANSI C %f to %d conversion). The order of the input strings is arbitrary, although there are a few restrictions, most of which are obvious and are indicated below. The various input strings in the specfile are described below, along with the possible choices (given in round parentheses, unless they already form part of the string) and the default value (in square parentheses). Defaults written as capitals refer to strings defined in the statnet.h file. Note that some strings do not have default values: their values must be specified or statnet will exit with an error message. If a string has a default value then the string can be omitted from the spec file (although I recommend that you always define your parameters, even if you choose the defaults). A few of the options have not yet been implemented or fully de-bugged: these are flagged below. Most of the input strings are prefixed by a three letter code indicating to what part of the code they are relevant: DAT training/application data NET network architecture TRN training of network GRD gradient descent MAC macopt (conjugate gradient optimizer) APP application of network The possible entries in the specfile are now listed and discussed. verbosity_level_(0/1/2/3/4) [2] - Amount of output from the program, ranging from 0 (nothing apart from two introduction lines and error messages) to 4 (lots of diagnostic stuff). Level 2 is appropriate for normal running. train_network?_(yes/no) [no] - do you want to learn the weights from a given set of data. apply_network?_(yes/no) [no] - do you want to apply the network to a set of data. DAT:training_data_file (file name) [no default] - The file containing the training data. See below for file format. File should be given the suffix ".in". DAT:application_data_file (file name) [no default] - The file containing the application data (i.e. that to which we want to apply the network). See below for file format. File should be given the suffix ".in". DAT:number_of_inputs_in_data_file_(X) (integer >=0) [no default] - total number of input variables in the data files (must be the same in both the training and application files) DAT:number_of_outputs_in_data_file_(Y) (integer >=0) [no default] - total number of output variables in the data files (must be the same in both the training and application files) DAT:number_of_input_ranges (integer >=0) [0] - statnet includes the option to use only a subset of the input variables in the network. Generally, you will want to select ranges of the input variables rather than having to list each one indiviually. Thus here you write the number of such ranges (R). For each range you specify a separate line prefixed with "DAT:INRANGE" and followed by the inclusive lower and upper bounds of the range. (The program then evaluates the required number of input nodes for the network.) These should follow on the the next R lines (although comment lines inserted will be ignored). For example, with R=3: DAT:INRANGE 1 10 DAT:INRANGE 17 17 DAT:INRANGE 32 34 Note that the second line will take input 17 only. If R is fewer than the number of such lines, only the first R will be read. If R is more, the program will exit with an error. (The program will search the specfile until R lines are found starting with DAT:INRANGE, so you can use this facility to comment out ranges). If R=0 then the entire range specified by X ("DAT:number_of_inputs_in_data_files_(X)") will be used. The ranges must be organised in increasing order and may not overlap (not all possible errors are trapped, so please ensure that this is done correctly). DAT:number_of_output_ranges (integer >=0) [0] - see explanation under "DAT:number_of_input_ranges", but swap "DAT:INRANGE" with "DAT:OTRANGE". DAT:number_of_patterns_to_exclude (integer >=0) [0] - it is possible for statnet to ignore patterns from either the training or the application file. Here you write the total number of patterns, p, to exclude, and on the next p lines you write the IDs of the patterns you wish to exclude (see format of input file under "File formats and screen output" below). statnet reports the patterns which it excludes as it excludes them; thus if patterns are listed which do not occur in either training or application data file, they will be ignored and no comment is made. The rest of the line is then skipped (the line may be in any format, e.g. it may be a corrupt line: statnet just searches until a newline (ASCII code 10) is reached). NET:number_of_networks_in_committee_(N) (integer >=0) [1] - statnet implements committees. That is, it trains N identical networks separately using the same data, but with different initial random weights for each. When applying statnet, it applies each network to the data, and averages the results. NET:size_of_first_hidden_layer_(H) (integer >=0) [no default] - number of hidden nodes in the first hidden layer (excludes bias node, which is automatically added by program). Setting H=0 is allowed, but you end up with a network with no connections between input and output (for which I can see no application, but the option is there!). NET:size_of_second_hidden_layer_(V) (integer >=0) [no default] - number of hidden nodes in the second hidden layer. (excludes bias node, which is automatically added by program). If this is set to zero (V=0), then statnet will use only a single hidden layer (with number of nodes specified by H). statnet will not allow you to achieve this by setting H=0 and V>0. Note that if you set V=1, this is equivalent to using only a single hidden layer. However, with V=0 statnet uses dedicated one hidden layer code, so will probably run faster. NET:data_scaling_(none/var/maxmin/netsize) [var] - the inputs and outputs can, and should, be scaled. "var" separately scales each input and output variable to have zero mean and unit standard deviation (this is the recommended option). "maxmin" is not yet implemented. The "netsize" option ensures that the sum input to the hidden layers does not grown with the number of inputs. The input-hidden transfer function is H = tanh (Hlam*S) where S is the sum over the product of each input and its associated input-hidden weight. If the "netsize" option is used, Hlam is set to Hlam = 1/( sqrt((double)(Xsize+1)) ). This is also included as part of the "var" scaling option. The var scaling option: if all values for a given input/output are the same, or if there is only one defined value for an output, statnet will force the standard deviation to 1 to prevent divide by zero. If there are no defined values for an output, statnet sets the mean to 0 and the standard deviation to 1. NET:number_of_outputs_to_log (integer >=0) [0] - some output/target variables are more naturally expressed on a log scale, and statnet will take logs (base 10) of any of the output variables you choose. Here you write the number (R) of output variables you wish to log. On the next R lines you specify which output variables. E.g., if R=2 you write NET:TAKELOG 1 NET:TAKELOG 3 to take logs of the first and third variables. First and third refers to their position in the data file and not in the selected output variable range (see "DAT:number_of_output_ranges" above). statnet takes the logs before doing any scaling, and the variables are kept as logs throughout the program, including in all of the output files and the error measures. (This is unlike the data scaling, which is entirely internal to the program). It is not possible to use statnet to take logs of the input variables.You should use logs whenever the uncertainties in an output variable are more naturally expressed as a fractional error (multiplicative rather than additive) in which case the beta parameters are also well suited to your problem. Recall that an error in a logged term (log x) of d is equal to a fractional error in the unlogged term (x) of 2.30*d. Currently no information regarding which terms have been logged are written to the weights file, so the user must keep track if s/he wished to use the weights file at a later date. NET:input_weight_file (file name) [no default] - if statnet has already been trained and a weight file produced, this weight file can be read in using this option. This option is used to apply the network to data or to continue training from a given set of weights. Often you will train and apply in a single run, in which case this field does not need to be set. Weight files should be given the suffix ".wt". Note that the weights for all networks in a committee are written to a single. See section below for details of file format. TRN:output_weight_file (file name) [DEFWTNAME] - The model weights calculated by the network(s) are written to this file. It the file already exists, statnet will warn you (although you may not have time to kill the program if training is quick). The default weight file name is only there in case you forget to specify it yourself. If you just apply the network with a set of weights which you read in (using 'NET:input_weight_file') then this option will be ignored, i.e. the weights file will not be re-written. Weight files should be given the suffix ".wt". See section below for details of file format. TRN:error_file (file name) [no default] - name of file in which to write error values at each training iteration. See section below for details of file format. If you use a committee of networks, this file is overwritten for each network, so upon completion show the training progress of only the last network to be trained. TRN:form_of_weight_init_(uniform/gaussian) [uniform] - initial weights for the network are drawn from a uniform or a Gaussian distribution. (Gaussian option has not yet been implemented). TRN:initial_weight_range (real value) [WTRNGDEF] - scale of random distribution from which initial weights are drawn. If "uniform" distribution has been chosen, it will range from -wtrng to +wtrng, where wtrng is the value specified here. TRN:random_number_seed (integer value) [RANSEEDDEF] - used to seed selection of initial weights. TRN:optimization_method_(grd/macopt) [macopt] - the weights can be optimised using the gradient descent method (grd) or a conjugate graident optimizer (macopt, written by David MacKay). Both are implemented, and work, but grd has not been as thoroughly tested. TRN:update_method_(svu/rsu/batch) [batch] - With gradient descent, the weights can be updated in one of three ways: svu - single vector update. Weights updated after the the error gradient has been calculated for a single pattern. This consitutes a single iteration. rsu - random sample update. A randomly selected fraction of the patterns in the training set are used to evaluate the error gradient in a single iteration, and this is used to update the weights. (Option not yet implemented) batch - in each iteration, all of the patterns in the training data are used to evaluate the error gradient and update the weights. Only batch mode can be used with macopt. Any other option will be ignored. TRN:weight_decay_(none/default/list) [none] - weight decay can be used to regularize the training procedure. 1/sqrt(alpha) can be thought of as the standard deviation of the Gaussian prior over the weights (with zero mean). The alpha parameters of the weight decay can be set using the "list" option, or the default values can be used. Note that the current version of statnet cannot learn the optimum alpha values from the data. If the list option is used, the following three lines must be specified: TRN:alpha_XH (real >=0) [alpha_XH_DEF] TRN:alpha_bH (real >=0) [alpha_bH_DEF] TRN:alpha_HV (real >=0) [alpha_HV_DEF] TRN:alpha_VY (real >=0) [alpha_VY_DEF] - These are the alpha parameters for the input to first hidden, input bias to first hidden, first hidden to second hidden, and second hidden to output weights respectively. alpha values can be set to zero, e.g. if you only want to apply weight decay to some sets of weights. If you are using a network with only one hidden layer, then alpha_VY specifies the alpha parameter for the hidden to output weights and alpha_HV is ignored. alpha_XH and alpha_bH have the same meaning as with two hidden layers. TRN:use_beta_parameters?_(none/default/list) [none] - beta is the coefficient of the error term for each output variable. beta sets the level of modelling precision which you want to achieve for each output variable. If this is limited by the noise in the data, 1/sqrt(beta) should be approximately equal to the standard deviation of the noise in the output variable. If this option is set to "no", all of the beta values are set to the default value, 6.0. Otherwise, the user must specify the beta values for the Y output variables on the next Y lines. Thus if Y=2 the next two lines would be: TRN:beta (real >=0) [BETADEF] TRN:beta (real >=0) [BETADEF] for the first and second outputs respectively. If the number of beta values specified is fewer than Y, the remainder will be set to the last value of beta given. If any beta value for any output variable is set to zero, then that output variable will not contribute anything to the error function. You can think of this as saying that the noise on this variable is infinite, so you don't care what it's value is. I don't know why you may want to do this, but the option is there. Note that if you use scaling, then beta is on the scale of the scaled variables, not the raw values in the data file (see the "Tips" section). GRD:number_of_iterations (+ve integer) [ITSDEF] - if using gradient descent, this is the total number of training iterations which will be performed GRD:learning_rate (+ve real) [ETADEF] - if using gradient descent, this is the learning rate (eta). The current version does not use the momentum parameter. MAC:convergence_tolerance_gradient (+ve real) [CONVTOLDEF] - if using macopt, this is the gradient convergence tolerance. Training will stop once the gradient of the total error function is less than the modulus of this value. This value is crucial and is highly dependent on your specific problem. In particular it depends on: (1) the scale of the training data (2) the total number of target values (error calculations) (3) the beta terms (4) the alpha terms (if using weight decay) MAC:maximum_number_of_iterations (+ve integer) [ITSDEF] - if using macopt, this is the maximum number of training iterations which will be performed. MAC:perform_maccheckgrad?_(yes/no) [no] - if using macopt, you can check that the gradient statnet is evaluating correct by using a routine in macopt which compares the analytic gradient with one calculated using first differences. This should only be necessary when debugging, but may be worth checking if statnet appears to be going wild. MAC:maccheckgrad_tolerance (real >=0) [MACCHECKTOLDEF] - to tolerance at which to check the gradient. APP:write_individual_network_results?_(yes/no) [no] - if yes, the classifications from each network (in addition to the committee classification) for each pattern are written to the output file. ###################################################################### # File formats and screen dump # ###################################################################### Input files (user written): pattern input files: ".in" Output files (statnet written): error files: ".err" weight files: ".wt" pattern output files: ".ot" During the application phase, statnet evaluate a number of error quantities for each output node using the defined true outputs in the application data file. These will be defined here and referred to in this section. For each output node statnet evaluates: D the number of defined patterns with a defined output t{p} the true output for pattern p y{n,p} the output from network n for pattern p c{p} = Average over all n of y{n,p} (the committee classification) e{p} = c{p} - t{p}, the external error for pattern p R = RMS over all p of e{p} A = average over all p of |e{p}| d{n,p} = y{n,p} - c{p} i{p} the internal error for pattern p = RMS over all n of d{n,p} I = Average over all p of i{p} If being used in application mode, statnet will write the following summary statistics to the screen for each output node, k: Output No_def_outputs RMS_ext_error Av_abs_ext_error Av_int_error k D R A I Note that k refers to the output node (i.e. the output variable as it appears in the selected data range, and not in the data file, which will generally differ if you have selected to use only certain output variables). If you were using a logged output, i.e. z = log_10 (y), and dz is the reported error above, note that error in y = dy = 2.30*y*dz = 2.30*(10^z)*dz fractional error in y = dy/y = 2.30*dz Pattern input files (.in) ------------------------- The first three lines are comment lines. The fourth line consists of three fields: 1. number of outputs, Y 2. number of inputs, X 3. number of patterns in the file, Npats Note that X and Y are also specified in the spec file. The latter takes precedence when running statnet, but if they do not agree, statnet will exit with an error message. The fifth line is a comment line. The following Npats lines contain the Npats patterns. Each line consists of the following columns: first: (1 -> 1 inc.) the pattern ID [string, max WORDSIZE] next Y: (2 -> 1+Y inc.) the outputs [real] next X: (2+Y -> 1+Y+X inc.) the inputs [rea] It is possible to have undefined outputs. Instead of a real number, a "x" or "X" should be put in its place. These "values" are then not used in calculating the error or its gradient. It is not possible to have undefined inputs. Error files (.err) ------------------ This is a dump of the nework error function and the error surface gradient as a function of iteration number. It is currently only produced when using macopt for training. The file name is specified by the "TRN:error_file" string in the training file. The file has 7 columns: 1. training iteration number 2. the likelihood error, lerr 3. the fractional contribution of lerr to the total error, i.e. lerr/toterr 4. the weight decay (regularization) error, werr 5. the fractional contribution of werr to the total error, i.e. werr/toterr 6. the total error, toterr = lerr + werr 7. the gradient, g. g = sqrt(gg), where gg is the squared gradient written by macopt (and written to STOUT when verbose>=2). Definitions of the terms: k = label for an output node p = label for a pattern (input vector) g = label for a group of weights (there are three groups of weights, each with a different alpha value as described under "TRN:weight_decay" above) w_{i,j} = weight between any two nodes i and j y_k = output from k^th node T_k = target value for k^th node e_{k,p} = y_k - T_k E_p = 0.5 * SUM_k( beta_{k} * e_{k,p}^2 ) In batch mode: lerr = SUM_{p}(E_p) werr = 0.5 * SUM_g( alpha_{g} * SUM_{i,j}[w_{i,j}^2] ) (for all i and j in g) toterr = lerr+werr is the error minimised at each iteration gg = sum of squares of gradient of toterr with respect to each and every weight Note that the errors scale with the total number of targets defined in the training data. The errors are also in terms of the scaled variables internal to the program. The gradient has similar dependencies. The data in this file is really intended for a qualitative indiciation of how training proceeds, or for making comparisons between different network models trained with identical data sets. Weight files (.wt) ------------------ The weight file is written by statnet after training, the file name being specified by the specfile string "TRN:output_weight_file". The weight files can also be read in by statnet using the string "NET:input_weight_file". The weights for all N networks in a committee are written to a single weights file. The weights file includes comment lines (which should not be removed) which explain its contents. A typical weights file is: # statnet weights file - do not add or remove lines # ################################################# # N (nets), X (input), H (first hidden), V (second hidden), Y (output): 3 2 8 0 2 # scaling type: var # Y (outputs) mean and stdev scaling factors: 2.39502e-01 5.17294e-01 1.74539e-02 4.11780e-01 # X (inputs) mean and stdev scaling factors: 4.68722e-01 3.60122e-01 2.01239e+00 1.70534e+00 # Lambda scale parameter for hidden layer: 0.44721 ####################### Network 1 ####################### # wtXH (input-hidden weights): -0.10747 0.06063 -0.35821 -0.10671 -0.51063 -0.03884 0.16480 0.37594 0.59963 -1.53936 -0.23609 0.59340 0.40570 0.01062 1.63650 -0.51318 0.51484 0.05172 0.36614 0.45354 -1.11139 0.78054 1.57896 1.45550 # wtHY (hidden-output weights): 0.78671 2.16788 0.01744 -1.98190 -2.39964 -0.14925 0.69139 2.17678 -1.34010 1.80811 2.90444 -0.73914 1.00746 2.69314 -1.17900 1.81725 -0.83621 -0.63986 ####################### Network 2 ####################### # wtXH (input-hidden weights): -0.10747 0.06063 -0.35821 -0.10671 -0.51063 -0.03884 0.16480 0.37594 0.59963 -1.53936 -0.23609 0.59340 0.40570 0.01062 1.63650 -0.51318 0.51484 0.05172 0.36614 0.45354 -1.11139 0.78054 1.57896 1.45550 # wtHY (hidden-output weights): 0.78671 2.16788 0.01744 -1.98190 -2.39964 -0.14925 0.69139 2.17678 -1.34010 1.80811 2.90444 -0.73914 1.00746 2.69314 -1.17900 1.81725 -0.83621 -0.63986 ####################### Network 3 ####################### # wtXH (input-hidden weights): -0.10747 0.06063 -0.35821 -0.10671 -0.51063 -0.03884 0.16480 0.37594 0.59963 -1.53936 -0.23609 0.59340 0.40570 0.01062 1.63650 -0.51318 0.51484 0.05172 0.36614 0.45354 -1.11139 0.78054 1.57896 1.45550 # wtHY (hidden-output weights): 0.78671 2.16788 0.01744 -1.98190 -2.39964 -0.14925 0.69139 2.17678 -1.34010 1.80811 2.90444 -0.73914 1.00746 2.69314 -1.17900 1.81725 -0.83621 -0.63986 The first three lines are comments. The fourth contains four fields: 1. number of networks in the committee, N 2. number of inputs, X 3. number of hidden nodes, H 4. number of outputs, Y The fifth line is a comment line. The sixth line specifies the data scaling type. If this is "var", then the next lines specify the scaling factors, as shown in the above example. If "var" or "netsize" scaling is given, the Hlam parameter is also specified. Next follows the weights for each network in turn. The weights themselves are specified in two groups: 1. wtXH (input-hidden weights) 2. wtHY (hidden-output weights) Output files (.ot) ------------------ This is the file of classifications produced by the committee of networks. This file is produced whenever we have the string "apply_network?_(yes/no)" set to "yes" in the spec file. The name of the file is generally the name of the application data file (that specified by the string "DAT:application_data_file" in the spec file) with ".ot" appended. The exception is when this file has the suffix ".in" in which case the ".in" is replaced with ".ot". The first 11 lines of a typical output file are as follows: # statnet classifications output file # ################################### # input file = ctD1.lc.11.c.pca # weights file = temp.wt # N (nets), Y (outputs) 3 1 # Com True Diff IntErr Net#1 Net#2 Net#3 0.2403 0.3070 -0.0667 0.0081 0.2449 0.2310 0.2450 0.3437 0.3772 -0.0335 0.0026 0.3450 0.3407 0.3454 0.4718 0.4649 0.0069 0.0004 0.4716 0.4722 0.4715 0.4856 0.4912 -0.0056 0.0002 0.4855 0.4859 0.4854 The first five lines are comments, but list the input file used to train the network and the resultant weights file. The next line gives the number of networks in the committee (N) and the number of outputs (Y). The seventh line is a comment line giving the details of the columns on the following lines. There then follow P lines, where P is the number of patterns (Npats) in the application data file. The ID string from the application data file is not written to this output file, although of course the patterns are in the same order in the two files! For each output (Y) there are 4+N columns, which are: 1. c{p}, the committee classification 2. t{p}, the true classification (i.e. that listed in the 2nd column in the application data file) 3. e{p}, the difference between these two 4. i{p}, the internal error in the committee classification calculated using the results from the different networks in the committees. If only one network was used in the committee (N=1), the value 0.0000 will appear The next N columns have the classifications from each individual network: the committee classification is the average of these N classifications, by definition. Note that these measurements are all in the units specified in the training/application data files (i.e. are not the scaled values, if any scaling was specified in the spec file). ###################################################################### # A test problem # ###################################################################### Accompanying this package are two test data files, testa.in and testb.in and a corresponding spec file, test.spec. This spec file has been run to train the network to produce the weights file, test.wt, and the error file test.err. You can use these files as the basis format for the input and spec files. The input data are spectral energy distributions for normal type stars over the spectral range 3000 to 10000 AA sampled at 10 AA and at 20 AA resolution. The three target variables are [M/H], logg and T_eff respectively. The spec file is set up to "classify" the spectra in the application data file. Run statnet on the specfile as it is and you will obtain the testb.ot file. Plotting this you will see that training was not optimal! These data are intended to give an idea of the different file formats and to see how the program runs, rather than give a demonstration of the application of the program to a real problem. ###################################################################### # Tips on using statnet # ###################################################################### Setting alpha and beta ---------------------- The error which statnet minimises when training was given under the discussion of the format of the .err files in the "File Formats" section. To set alpha and beta proceed as follows. First set beta to the appropriate values, taking into account the uncertainty in the targets and the scaling used (see rest of this section). Note that what is really important is the *relative* sizes of the beta terms. Then set alpha. The "standard" interpretation of alpha is that it is equal to 1/sqrt(sigma), where sigma is the standard deviation of the Gaussian prior over the weights. However, this is not very useful as we don't know how big the weights should be. Weight decay only occurs once werr is not insignificant compared to lerr. In practice, alpha can be left at its default values unless it doesn't seem to be working. Then trial and error may be the best. Scaling and its effect on beta ------------------------------ I strongly recommend that you use the variance scaling option when training the network. Without it, you run the risk of the weight updates becoming unstable, and the final weights becoming extremely large. Remember that the value of beta is then in terms of the scaled variables. As the variance scaling option gives unit standard deviation, 1/SQRT(beta_k) can be roughly interpreted as a fractional, rather than absolute, uncertainty in the k^th variable. The default beta_k value of 6.0 corresponds to a standard deviation of 0.4. Thus if the data are variance scaled and roughly normally distributed, 95% of the data will lie in the range -2 to +2, so this standard deviation corresponds to about a 10% uncertainty. If the uncertainties in this output variable really are fractional rather than absolute, then you should probably also be using the logs of this variable if the value in the data file is not already logged (see discussion under spec file string "NET:number_of_outputs_to_log"). Beta and the input space ------------------------ As mentioned above, beta_{k} can be considered as 1/sigma^2, where sigma is the error in the k^th output. The total error which the network is minimizing when it trains is SUM(E_p), where E_p = 0.5 * SUM_k( beta_{k} * e_{k,p}^2 ) Thus beta_{k} is a weighting factor in the error term. An output with a relatively large beta_{k} will dominate the error minimization, i.e. the network will be trained to give good predictions for this output at the expense of the others. Conversely, an output with a relatively small beta_{k} will make a small contribution to the error, so the network won't be as concerned with giving good predictions for this output. Thus the size of beta_{k} depends on (1) how well you want the network to do for each output, and (2) how well it can do, i.e. how precise are the targets. There is a lot more which could be discussed on the alpha and beta terms, but most users will not be interested, so it's "beyond the scope of this file". ###################################################################### # Modifications since the previous release version # ###################################################################### current version = 2.00 1.01 Added ability to take logs of some of the output variables. I think it is okay, but you should check it in more detail sometime when you have more outputs (and thus more ranges). 1.02 Added ability to exclude patterns from training and/or application files (see notes under specfile option DAT:number_of_patterns_to_exclude ) 1.03 very minor: changed output format of scale values from decimal to exponential format. Inserted print option (for flag[14]>=4) to print mean and stdev as they are calculated. 2.00 Second hidden layer implemented