######################################################################
######################################################################
##                                                                  ##
##                                                                  ##
##                            statnet                               ##
##                                                                  ##
##            a feedforward interpolation neural network            ##
##                                                                  ##
##                              by                                  ##
##                                                                  ##
##                       Coryn Bailer-Jones                         ##
##                                                                  ##
##                           07/12/98                               ##
##                                                                  ##
##             email: calj at mpia-hd.mpg.de                        ##
##               www: http://www.mpia-hd.mpg.de/homes/calj/         ##
##                                                                  ##
##                                                                  ##
##        see the README file for disclaimer and warranty           ##
##        see the statnet_manual file for operational details       ##
##                                                                  ##
##     This file is copyright 1998,1999 by C.A.L. Bailer-Jones      ##
##                                                                  ##
##                                                                  ##
######################################################################
######################################################################


FILE:		statnet_manual
DESCRIPTION:    statnet operations manual
AUTHOR:		Coryn Bailer-Jones
LAST MOD DATE:	01.02.00


######################################################################
######################################################################
##                                                                  ##
##                   statnet operations manual                      ##
##                                                                  ##
######################################################################
######################################################################

This file provides the information required to use and understand the
statnet software. It assumes an understanding of the principles
feedforward neural networks and committees.

Contents
--------
1.  What statnet is
2.  Other sources of information
3.  The statnet files
4.  How to run statnet
5.  The specfile
6.  File formats and screen output
7.  A test problem
8.  Tips on using statnet
9.  Modifications since the previous release version


######################################################################
#                             what statnet is                        #
######################################################################

statnet is a feedforward neural network for implementing non-linear
static functions. It is trained by backpropagation.

statnet uses the following:

o choice between one or two hidden layers 
o tanh input-hidden transfer function(s), linear hidden-output function
o sum-of-squares error measure

statnet incorporates the following optional features:

o feature selection (i.e. selection of subsets of input and output 
  variables from the data files)
o mean and variance scaling of input and output variables
o conjugate gradient or gradient descent optimizers
o ability to have undefined (missing) targets in training data file


######################################################################
#                    Other sources of information                    #
######################################################################

Feedforward neural networks and the back propagation training
algorithm have been described in great detail in a variety of sources
and this document does not explain their principles.  For information
on the application of a similar (but more primitive) program to this
to the classification of stellar spectra, see "Automated
classification of stellar spectra - II. Two-dimensional classification
with neural networks and principal components analysis", by
Bailer-Jones, Irwin & von Hippel 1998, Monthly Notices of the Royal
Astronomical Society, 298, 361-377
(http://www.mpia-hd.mpg.de/stars/calj/p1.html).  This paper also
explains the basic principles of feedforward networks.

The source code for statnet is reasonably well documented if you
want to get into the nitty-gritty details.

The conjugate gradient optimizer "macopt" has its own source of
documentation, which can be obtained from
http://wol.ra.phy.cam.ac.uk/mackay/c/macopt.html

Further enquiries can be addressed to the author at the email address
at the head of this file.  Related publications and other information
will be available from the statnet web page
http://www.mpia-hd.mpg.de/homes/calj/statnet.html Also check this web
page for more recent releases of the code or of this manual.

Please mail any bug reports/suggestions for improvements to me
at the above email address.


######################################################################
#                        The statnet files                           #
######################################################################

Makefile	the makefile
README		disclaimer and warranty. Please read this before you
		proceed.
ansi/		a directory which contains macopt (David MacKay's
		conjugate gradient optimizer) and its ancillary
		files. This is the standard macopt distribution
		and has not been modified in any way.
netsubs.c	collection of subroutines
netsubs.h	header file for netsubs
ran1.c		random number generator
statnet		the statnet executable (Sun Solaris)
statnet.c	main statnet program
statnet.h	header file for statnet
statnet_manual	this file
test.err	test error file (from running statnet on test.spec)
test.spec	the spec file for the test problem
test.wt		test weights file (from running statnet on test.spec)
testa.in	test input data file
testb.in	test input data file


procres.pl	A perl program for interogating the output (.ot) 
		files. Eternally developmental, i.e. must be edited 
		by hand to get it to do what you want.               


######################################################################
#                          How to run statnet                        #
######################################################################

statnet is written in ANSI C and will work (i.e. has been tested)
under SunOs 5.5. The executable with the release version is SunOs
binary, complied using gcc.  To get it to run under any other flavour
of UNIX you'll need to compile it. From the statnet directory (in
which you find this file) do the following:

cd ansi/
make test_macII
cd ../
make statnet

The make will give a few warnings, but these can be ignored. You may
need to adjust the makefile for your particular operating system. Note
that in one place statnet uses a UNIX specific command via the C
function "system" (although this only prints the date so it's hardly
crucial - remove the line from the source code if it causes
problems). Most memory allocation is done dynamically, but a few
things have fixed maximum sizes. You are therefore advised to check in
the statnet.h file that these arrays will be sufficiently large for
your purposes.

To run statnet, type

statnet specfile

where specfile is the only command line argument which statnet reads.
Typing "statnet -v" will give the version number only.


######################################################################
#                            The specfile                            #
######################################################################

The specfile contains all of the relevant information on the committee
and network architecture, the data files and the training
method. specfiles should be given the ".spec" suffix.

The specfile is read by searching for strings, such as
"train_network?_(yes/no)", therefore these strings must not
be changed or they wil be ignored. The relevant input for each
string must be the next item on the same line, e.g.

train_network?_(yes/no)				yes

The items in parentheses "(yes/no)" indicate the choices available.
Note that any line in the specfile preceded by a "#" symbol will be
ignored. Inappropriate inputs which disobey the required value will be
flagged as errors by statnet. The only exception is when real values are
specified instead of integers, in which case only the integer part of
the number will be used (ANSI C %f to %d conversion). The order of the
input strings is arbitrary, although there are a few restrictions,
most of which are obvious and are indicated below.

The various input strings in the specfile are described below, along
with the possible choices (given in round parentheses, unless they
already form part of the string) and the default value (in square
parentheses).  Defaults written as capitals refer to strings defined
in the statnet.h file.  Note that some strings do not have default
values: their values must be specified or statnet will exit with an
error message. If a string has a default value then the string can be
omitted from the spec file (although I recommend that you always
define your parameters, even if you choose the defaults).

A few of the options have not yet been implemented or fully de-bugged:
these are flagged below.  

Most of the input strings are prefixed by a three letter code
indicating to what part of the code they are relevant:

DAT  training/application data
NET  network architecture
TRN  training of network
GRD  gradient descent
MAC  macopt (conjugate gradient optimizer)
APP  application of network

The possible entries in the specfile are now listed and discussed.


verbosity_level_(0/1/2/3/4)				[2]
 - Amount of output from the program, ranging from 0 (nothing apart
   from two introduction lines and error messages) to 4 (lots of 
   diagnostic stuff).  Level 2 is appropriate for normal running.

train_network?_(yes/no)					[no]
 - do you want to learn the weights from a given set of data.

apply_network?_(yes/no)					[no]
 - do you want to apply the network to a set of data.

DAT:training_data_file			(file name) 	[no default]
 - The file containing the training data. See below for file format.
   File should be given the suffix ".in".

DAT:application_data_file		(file name)	[no default]
 - The file containing the application data (i.e. that to which we 
   want to apply the network). See below for file format.  File should
   be given the suffix ".in".

DAT:number_of_inputs_in_data_file_(X)	(integer >=0) 	[no default]
 - total number of input variables in the data files
   (must be the same in both the training and application files)

DAT:number_of_outputs_in_data_file_(Y)	(integer >=0) 	[no default]
 - total number of output variables in the data files
   (must be the same in both the training and application files)

DAT:number_of_input_ranges		(integer >=0)	[0]
 - statnet includes the option to use only a subset of the input
   variables in the network. Generally, you will want to select
   ranges of the input variables rather than having to list
   each one indiviually. Thus here you write the number of such
   ranges (R). For each range you specify a separate line prefixed 
   with "DAT:INRANGE" and followed by the 
   inclusive lower and upper bounds of the range. (The program 
   then evaluates the required number of input nodes for the 
   network.) These should follow on the the next R lines
   (although comment lines inserted will be ignored).
   For example, with R=3:
DAT:INRANGE					1   10
DAT:INRANGE					17  17
DAT:INRANGE					32  34
   Note that the second line will take input 17 only. If R is
   fewer than the number of such lines, only the first R will
   be read. If R is more, the program will exit with an error.
   (The program will search the specfile until R lines are found
   starting with DAT:INRANGE, so you can use this facility to
   comment out ranges). If R=0 then the entire range
   specified by X ("DAT:number_of_inputs_in_data_files_(X)")
   will be used. The ranges must be organised in increasing
   order and may not overlap (not all possible errors are trapped,
   so please ensure that this is done correctly).

DAT:number_of_output_ranges		(integer >=0)	[0]
 - see explanation under "DAT:number_of_input_ranges", but
   swap "DAT:INRANGE" with "DAT:OTRANGE".

DAT:number_of_patterns_to_exclude	(integer >=0)	[0]
 - it is possible for statnet to ignore patterns from either the
   training or the application file. Here you write the total
   number of patterns, p, to exclude, and on the next p lines you 
   write the IDs of the patterns you wish to exclude (see format
   of input file under "File formats and screen output" below).
   statnet reports the patterns which it excludes as it excludes
   them; thus if patterns are listed which do not occur in either
   training or application data file, they will be ignored and no
   comment is made. The rest of the line is then skipped (the line 
   may be in any format, e.g. it may be a corrupt line: statnet 
   just searches until a newline (ASCII code 10) is reached).

NET:number_of_networks_in_committee_(N)	(integer >=0) 	[1]
 - statnet implements committees. That is, it trains N identical 
   networks separately using the same data, but with different
   initial random weights for each. When applying statnet, it 
   applies each network to the data, and averages the results.

NET:size_of_first_hidden_layer_(H)	(integer >=0) 	[no default]
 - number of hidden nodes in the first hidden layer
   (excludes bias node, which is automatically added by program). 
   Setting H=0 is allowed, but you end up with a network with no
   connections between input and output (for which I can see no
   application, but the option is there!).

NET:size_of_second_hidden_layer_(V)	(integer >=0) 	[no default]
 - number of hidden nodes in the second hidden layer.
   (excludes bias node, which is automatically added by program). 
   If this is set to zero (V=0), then statnet will use only a single 
   hidden layer (with number of nodes specified by H). statnet will 
   not allow you to achieve this by setting H=0 and V>0. Note that if 
   you set V=1, this is equivalent to using only a single hidden layer. 
   However, with V=0 statnet uses dedicated one hidden layer code, so 
   will probably run faster.

NET:data_scaling_(none/var/maxmin/netsize)  		[var]
 - the inputs and outputs can, and should, be scaled. 
   "var" separately scales each input and output variable to
   have zero mean and unit standard deviation (this is the recommended
   option). "maxmin" is not yet implemented.  The "netsize" option
   ensures that the sum input to the hidden layers does not grown with
   the number of inputs. The input-hidden transfer function is H =
   tanh (Hlam*S) where S is the sum over the product of each input and
   its associated input-hidden weight.  If the "netsize" option is
   used, Hlam is set to Hlam = 1/( sqrt((double)(Xsize+1)) ). This
   is also included as part of the "var" scaling option.
   The var scaling option: if all values for a given input/output
   are the same, or if there is only one defined value for an output,
   statnet will force the standard deviation to 1 to prevent divide by 
   zero. If there are no defined values for an output, statnet
   sets the mean to 0 and the standard deviation to 1.

NET:number_of_outputs_to_log		(integer >=0) 	[0]
 - some output/target variables are more naturally expressed on a log
   scale, and statnet will take logs (base 10) of any of the output
   variables you choose. Here you write the number (R) of output 
   variables you wish to log. On the next R lines you specify which
   output variables. E.g., if R=2 you write
NET:TAKELOG					1
NET:TAKELOG					3
   to take logs of the first and third variables. First and third 
   refers to their position in the data file and not in the selected
   output variable range (see "DAT:number_of_output_ranges" above).
   statnet takes the logs before doing any scaling,
   and the variables are kept as logs throughout the program,
   including in all of the output files and the error measures. 
   (This is unlike the data scaling, which is entirely internal to the 
   program). It is not possible to use statnet to take logs of the 
   input variables.You should use logs whenever the uncertainties in an
   output variable are more naturally expressed as a fractional error 
   (multiplicative rather than additive) in which case the beta 
   parameters are also well suited to your problem. Recall that an
   error in a logged term (log x) of d is equal to a fractional error 
   in the unlogged term (x) of 2.30*d. Currently no information
   regarding which terms have been logged are written to the weights
   file, so the user must keep track if s/he wished to use the
   weights file at a later date.

NET:input_weight_file			(file name) 	[no default]
 - if statnet has already been trained and a weight file produced, this
   weight file can be read in using this option. This option is used to 
   apply the network to data or to continue training from a given set of 
   weights. Often you will train and apply in a single run, in which case 
   this field does not need to be set. Weight files should be given the 
   suffix ".wt". Note that the weights for all networks in a committee 
   are written to a single. See section below for details of file format.

TRN:output_weight_file			(file name) 	[DEFWTNAME]
 - The model weights calculated by the network(s) are written to this
   file. It the file already exists, statnet will warn you (although
   you may not have time to kill the program if training is quick).
   The default weight file name is only there in case you forget to
   specify it yourself. If you just apply the network with a set of
   weights which you read in (using 'NET:input_weight_file') then this
   option will be ignored, i.e. the weights file will not be
   re-written.  Weight files should be given the suffix ".wt".  See
   section below for details of file format.

TRN:error_file				(file name) 	[no default]
 - name of file in which to write error values at each training 
   iteration. See section below for details of file format.
   If you use a committee of networks, this file is overwritten for
   each network, so upon completion show the training progress of
   only the last network to be trained.

TRN:form_of_weight_init_(uniform/gaussian)		[uniform]
 - initial weights for the network are drawn from a uniform or a
   Gaussian distribution. (Gaussian option has not yet been
   implemented).

TRN:initial_weight_range		(real value) 	[WTRNGDEF]
 - scale of random distribution from which initial weights are
   drawn. If "uniform" distribution has been chosen, it will
   range from -wtrng to +wtrng, where wtrng is the value specified here.

TRN:random_number_seed			(integer value) [RANSEEDDEF]
 - used to seed selection of initial weights. 

TRN:optimization_method_(grd/macopt)			[macopt]
 - the weights can be optimised using the gradient descent method
   (grd) or a conjugate graident optimizer (macopt, written by David
   MacKay).  Both are implemented, and work, but grd  has not 
   been as thoroughly tested.

TRN:update_method_(svu/rsu/batch)			[batch]
 - With gradient descent, the weights can be updated in one of three
   ways:
   svu - single vector update. Weights updated after the
         the error gradient has been calculated for a single pattern.
	 This consitutes a single iteration.
   rsu - random sample update. A randomly selected fraction of
	 the patterns in the training set are used to evaluate the
	 error gradient in a single iteration, and this is used to
	 update the weights. (Option not yet implemented)
 batch - in each iteration, all of the patterns in the training data
	 are used to evaluate the error gradient and update the
	 weights.
   Only batch mode can be used with macopt. Any other option will be
   ignored.

TRN:weight_decay_(none/default/list)			[none]
 - weight decay can be used to regularize the training procedure.
   1/sqrt(alpha) can be thought of as the standard deviation of the
   Gaussian prior over the weights (with zero mean).  The alpha
   parameters of the weight decay can be set using the
   "list" option, or the default values can be used.  Note that the
   current version of statnet cannot learn the optimum alpha values from
   the data. If the list option is used, the following three lines must
   be specified:
TRN:alpha_XH				(real >=0) 	[alpha_XH_DEF]
TRN:alpha_bH				(real >=0) 	[alpha_bH_DEF]
TRN:alpha_HV				(real >=0) 	[alpha_HV_DEF]
TRN:alpha_VY				(real >=0) 	[alpha_VY_DEF]
 - These are the alpha parameters for the input to first hidden, input bias to 
   first hidden, first hidden to second hidden, and second hidden to 
   output weights respectively. alpha values can be set to zero, 
   e.g. if you only want to apply weight decay to some sets of weights.
   If you are using a network with only one hidden layer, then alpha_VY 
   specifies the alpha parameter for the hidden to output weights and
   alpha_HV is ignored. alpha_XH and alpha_bH have the same meaning
   as with two hidden layers.

TRN:use_beta_parameters?_(none/default/list)		[none]
 - beta is the coefficient of the error term for each output variable.
   beta sets the level of modelling precision which you want to achieve
   for each output variable. If this is limited by the noise in
   the data, 1/sqrt(beta) should be approximately equal to the standard 
   deviation of the noise in the output variable. If this option is set 
   to "no", all of the beta values are set to the default value, 6.0. 
   Otherwise, the user must specify the beta values for the Y output
   variables on the next Y lines. Thus if Y=2 the next two lines 
   would be:
TRN:beta				(real >=0) 	[BETADEF]
TRN:beta				(real >=0) 	[BETADEF]
   for the first and second outputs respectively. If the
   number of beta values specified is fewer than Y, the remainder will
   be set to the last value of beta given. If any beta value for any 
   output variable is set to zero, then that output variable will not
   contribute anything to the error
   function. You can think of this as saying that the noise on this 
   variable is infinite, so you don't care what it's value is. I don't 
   know why you may want to do this, but the option is there. Note that 
   if you use scaling, then beta is on the scale of the scaled 
   variables, not the raw values in the data file (see the "Tips" 
   section). 

GRD:number_of_iterations		(+ve integer) 	[ITSDEF]
 - if using gradient descent, this is the total number of training
   iterations which will be performed

GRD:learning_rate			(+ve real) 	[ETADEF]
 - if using gradient descent, this is the learning rate (eta).
   The current version does not use the momentum parameter.

MAC:convergence_tolerance_gradient	(+ve real)      [CONVTOLDEF]
 - if using macopt, this is the gradient convergence tolerance.
   Training will stop once the gradient of the total error
   function is less than the modulus of this value. 
   This value is crucial and is highly dependent on your specific
   problem. In particular it depends on:
   (1) the scale of the training data
   (2) the total number of target values (error calculations)
   (3) the beta terms
   (4) the alpha terms (if using weight decay)

MAC:maximum_number_of_iterations	(+ve integer)	[ITSDEF]
 - if using macopt, this is the maximum number of training iterations
   which will be performed.

MAC:perform_maccheckgrad?_(yes/no)			[no]
 - if using macopt, you can check that the gradient statnet is
   evaluating correct by using a routine in macopt which compares the
   analytic gradient with one calculated using first differences. This
   should only be necessary when debugging, but may be worth checking
   if statnet appears to be going wild.

MAC:maccheckgrad_tolerance		(real >=0) 	[MACCHECKTOLDEF]
 - to tolerance at which to check the gradient.

APP:write_individual_network_results?_(yes/no)		[no]
 - if yes, the classifications from each network (in addition to
   the committee classification) for each pattern are written to
   the output file.


######################################################################
#                   File formats and screen dump                     #
######################################################################

Input files (user written):     pattern input files: ".in"
Output files (statnet written): error files: ".err"
				weight files: ".wt"
			        pattern output files: ".ot"

During the application phase, statnet evaluate a number of error
quantities for each output node using the defined true outputs in the
application data file. These will be defined here and referred to in
this section. For each output node statnet evaluates:

D	the number of defined patterns with a defined output 
t{p}	the true output for pattern p

y{n,p}	the output from network n for pattern p
c{p}	= Average over all n of y{n,p} (the committee classification)
e{p}	= c{p} - t{p}, the external error for pattern p
R	= RMS over all p of e{p}
A	= average over all p of |e{p}|

d{n,p}	= y{n,p} - c{p}
i{p}	the internal error for pattern p
        = RMS over all n of d{n,p}
I	= Average over all p of i{p}


If being used in application mode, statnet will write the following
summary statistics to the screen for each output node, k:

Output   No_def_outputs   RMS_ext_error   Av_abs_ext_error   Av_int_error
  k            D               R                A                 I
	
Note that k refers to the output node (i.e. the output variable as it
appears in the selected data range, and not in the data file, which
will generally differ if you have selected to use only certain output
variables). If you were using a logged output, i.e. z = log_10 (y),
and dz is the reported error above, note that 

error in y 		 = dy   = 2.30*y*dz = 2.30*(10^z)*dz
fractional error in y	 = dy/y = 2.30*dz


Pattern input files (.in)
-------------------------

The first three lines are comment lines. The fourth line consists of
three fields:
1. number of outputs, Y
2. number of inputs, X
3. number of patterns in the file, Npats
Note that X and Y are also specified in the spec file. The latter takes
precedence when running statnet, but if they do not agree, statnet will 
exit with an error message. 
The fifth line is a comment line.
The following Npats lines contain the Npats patterns. Each line
consists of the following columns:
first: 	(1   -> 1 inc.)		the pattern ID	[string, max WORDSIZE]
next Y:	(2   -> 1+Y inc.)	the outputs 	[real]
next X: (2+Y -> 1+Y+X inc.)	the inputs	[rea]
It is possible to have undefined outputs. Instead of a real number, a
"x" or "X" should be put in its place. These "values" are then not
used in calculating the error or its gradient.  It is not possible to
have undefined inputs.


Error files (.err)
------------------

This is a dump of the nework error function and the error surface
gradient as a function of iteration number. It is currently only
produced when using macopt for training. The file name is specified by
the "TRN:error_file" string in the training file.

The file has 7 columns:

1. training iteration number
2. the likelihood error, lerr
3. the fractional contribution of lerr to the total error,
   i.e. lerr/toterr
4. the weight decay (regularization) error, werr
5. the fractional contribution of werr to the total error,
   i.e. werr/toterr
6. the total error, toterr = lerr + werr
7. the gradient, g. g = sqrt(gg), where gg is the squared gradient
   written by macopt (and written to STOUT when verbose>=2).

Definitions of the terms:

k       = label for an output node
p       = label for a pattern (input vector)
g       = label for a group of weights 
          (there are three groups of weights, each with a different
	   alpha value as described under "TRN:weight_decay" above)
w_{i,j} = weight between any two nodes i and j
y_k     = output from k^th node
T_k     = target value for k^th node

e_{k,p} = y_k - T_k
E_p     = 0.5 * SUM_k( beta_{k} * e_{k,p}^2 )  

In batch mode:  lerr = SUM_{p}(E_p)
		werr = 0.5 * SUM_g( alpha_{g} * SUM_{i,j}[w_{i,j}^2] )
						(for all i and j in g)

toterr = lerr+werr is the error minimised at each iteration
gg     = sum of squares of gradient of toterr with respect to each
	 and every weight

Note that the errors scale with the total number of targets defined in
the training data. The errors are also in terms of the scaled
variables internal to the program. The gradient has similar
dependencies. The data in this file is really intended for a
qualitative indiciation of how training proceeds, or for making
comparisons between different network models trained with identical
data sets.


Weight files (.wt)
------------------

The weight file is written by statnet after training, the file name
being specified by the specfile string "TRN:output_weight_file".  The
weight files can also be read in by statnet using the string
"NET:input_weight_file". The weights for all N networks in a committee
are written to a single weights file. The weights file includes
comment lines (which should not be removed) which explain its
contents. A typical weights file is:

# statnet weights file - do not add or remove lines
# #################################################
# N (nets), X (input), H (first hidden), V (second hidden), Y (output):
  3         2          8                 0                  2
# scaling type: 
var
# Y (outputs) mean and stdev scaling factors:
  2.39502e-01  5.17294e-01
  1.74539e-02  4.11780e-01
# X (inputs) mean and stdev scaling factors:
  4.68722e-01  3.60122e-01
  2.01239e+00  1.70534e+00
# Lambda scale parameter for hidden layer:
  0.44721
####################### Network  1 #######################
# wtXH (input-hidden weights):
 -0.10747   0.06063  -0.35821  -0.10671  -0.51063  -0.03884   0.16480   0.37594 
  0.59963  -1.53936  -0.23609   0.59340   0.40570   0.01062   1.63650  -0.51318 
  0.51484   0.05172   0.36614   0.45354  -1.11139   0.78054   1.57896   1.45550 
# wtHY (hidden-output weights):
  0.78671   2.16788 
  0.01744  -1.98190 
 -2.39964  -0.14925 
  0.69139   2.17678 
 -1.34010   1.80811 
  2.90444  -0.73914 
  1.00746   2.69314 
 -1.17900   1.81725 
 -0.83621  -0.63986 
####################### Network  2 #######################
# wtXH (input-hidden weights):
 -0.10747   0.06063  -0.35821  -0.10671  -0.51063  -0.03884   0.16480   0.37594 
  0.59963  -1.53936  -0.23609   0.59340   0.40570   0.01062   1.63650  -0.51318 
  0.51484   0.05172   0.36614   0.45354  -1.11139   0.78054   1.57896   1.45550 
# wtHY (hidden-output weights):
  0.78671   2.16788 
  0.01744  -1.98190 
 -2.39964  -0.14925 
  0.69139   2.17678 
 -1.34010   1.80811 
  2.90444  -0.73914 
  1.00746   2.69314 
 -1.17900   1.81725 
 -0.83621  -0.63986 
####################### Network  3 #######################
# wtXH (input-hidden weights):
 -0.10747   0.06063  -0.35821  -0.10671  -0.51063  -0.03884   0.16480   0.37594 
  0.59963  -1.53936  -0.23609   0.59340   0.40570   0.01062   1.63650  -0.51318 
  0.51484   0.05172   0.36614   0.45354  -1.11139   0.78054   1.57896   1.45550 
# wtHY (hidden-output weights):
  0.78671   2.16788 
  0.01744  -1.98190 
 -2.39964  -0.14925 
  0.69139   2.17678 
 -1.34010   1.80811 
  2.90444  -0.73914 
  1.00746   2.69314 
 -1.17900   1.81725 
 -0.83621  -0.63986 

The first three lines are comments. The fourth contains four fields:
1. number of networks in the committee, N
2. number of inputs, X
3. number of hidden nodes, H
4. number of outputs, Y
The fifth line is a comment line.
The sixth line specifies the data scaling type. If this is "var", then
the next lines specify the scaling factors, as shown in the above
example.
If "var" or "netsize" scaling is given, the Hlam parameter is also
specified. 
Next follows the weights for each network in turn.
The weights themselves are specified in two groups:
1. wtXH (input-hidden weights)
2. wtHY (hidden-output weights)


Output files (.ot)
------------------

This is the file of classifications produced by the committee of
networks.  This file is produced whenever we have the string
"apply_network?_(yes/no)" set to "yes" in the spec file.  The name of
the file is generally the name of the application data file (that
specified by the string "DAT:application_data_file" in the spec file)
with ".ot" appended. The exception is when this file has the suffix
".in" in which case the ".in" is replaced with ".ot". The first
11 lines of a typical output file are as follows:

# statnet classifications output file
# ###################################
# input   file = ctD1.lc.11.c.pca
# weights file = temp.wt
# N (nets), Y (outputs)
  3         1
# Com   True    Diff  IntErr  Net#1  Net#2  Net#3  
0.2403 0.3070 -0.0667 0.0081 0.2449 0.2310 0.2450 
0.3437 0.3772 -0.0335 0.0026 0.3450 0.3407 0.3454 
0.4718 0.4649  0.0069 0.0004 0.4716 0.4722 0.4715 
0.4856 0.4912 -0.0056 0.0002 0.4855 0.4859 0.4854 

The first five lines are comments, but list the input file used to
train the network and the resultant weights file. The next line gives
the number of networks in the committee (N) and the number of
outputs (Y). The seventh line is a comment line giving the details of the
columns on the following lines. There then follow P lines, where
P is the number of patterns (Npats) in the application data file.
The ID string from the application data file is not written to this output file,
although of course the patterns are in the same order in the two files!
For each output (Y) there are 4+N columns, which are:
1. c{p}, the committee classification
2. t{p}, the true classification (i.e. that listed in the 2nd column
   in the application data file)
3. e{p}, the difference between these two
4. i{p}, the internal error in the committee classification calculated
   using the results from the different networks in the committees. If
   only one network was used in the committee (N=1), the value 
   0.0000 will appear 
The next N columns have the classifications from each individual network:
the committee classification is the average of these N classifications,
by definition.

Note that these measurements are all in the units specified in the
training/application data files (i.e. are not the scaled values, if
any scaling was specified in the spec file).


######################################################################
#                           A test problem                           #
######################################################################

Accompanying this package are two test data files, testa.in and
testb.in and a corresponding spec file, test.spec. This spec file has
been run to train the network to produce the weights file, test.wt,
and the error file test.err. You can use these files as the basis
format for the input and spec files. The input data are spectral
energy distributions for normal type stars over the spectral range
3000 to 10000 AA sampled at 10 AA and at 20 AA resolution. The three
target variables are [M/H], logg and T_eff respectively. The spec file
is set up to "classify" the spectra in the application data file.  Run
statnet on the specfile as it is and you will obtain the testb.ot
file. Plotting this you will see that training was not optimal!  These
data are intended to give an idea of the different file formats and to
see how the program runs, rather than give a demonstration of the
application of the program to a real problem.


######################################################################
#                        Tips on using statnet                       #
######################################################################

Setting alpha and beta
----------------------

The error which statnet minimises when training was given under the
discussion of the format of the .err files in the "File Formats"
section. To set alpha and beta proceed as follows. First set beta to
the appropriate values, taking into account the uncertainty in the
targets and the scaling used (see rest of this section). Note that
what is really important is the *relative* sizes of the beta terms.
Then set alpha. The "standard" interpretation of alpha is that it is
equal to 1/sqrt(sigma), where sigma is the standard deviation of the
Gaussian prior over the weights. However, this is not very useful as
we don't know how big the weights should be. Weight decay only occurs
once werr is not insignificant compared to lerr. In practice, alpha
can be left at its default values unless it doesn't seem to be
working.  Then trial and error may be the best.


Scaling and its effect on beta
------------------------------

I strongly recommend that you use the variance scaling option when
training the network. Without it, you run the risk of the weight
updates becoming unstable, and the final weights becoming extremely
large. Remember that the value of beta is then in terms of the scaled
variables. As the variance scaling option gives unit standard
deviation, 1/SQRT(beta_k) can be roughly interpreted as a fractional,
rather than absolute, uncertainty in the k^th variable.  The default
beta_k value of 6.0 corresponds to a standard deviation of 0.4. Thus
if the data are variance scaled and roughly normally distributed, 95%
of the data will lie in the range -2 to +2, so this standard deviation
corresponds to about a 10% uncertainty.  If the uncertainties in this
output variable really are fractional rather than absolute, then you
should probably also be using the logs of this variable if the value
in the data file is not already logged (see discussion under spec file
string "NET:number_of_outputs_to_log").


Beta and the input space
------------------------

As mentioned above, beta_{k} can be considered as 1/sigma^2, where
sigma is the error in the k^th output. The total error which the
network is minimizing when it trains is SUM(E_p), where

E_p     = 0.5 * SUM_k( beta_{k} * e_{k,p}^2 )  

Thus beta_{k} is a weighting factor in the error term. An output with
a relatively large beta_{k} will dominate the error minimization,
i.e. the network will be trained to give good predictions for this
output at the expense of the others. Conversely, an output with a
relatively small beta_{k} will make a small contribution to the error,
so the network won't be as concerned with giving good predictions for
this output. Thus the size of beta_{k} depends on (1) how well you
want the network to do for each output, and (2) how well it can do,
i.e. how precise are the targets. There is a lot more which could be
discussed on the alpha and beta terms, but most users will not be
interested, so it's "beyond the scope of this file".


######################################################################
#          Modifications since the previous release version          #
######################################################################

current version = 2.00

1.01	Added ability to take logs of some of the output variables.
	I think it is okay, but you should check it in more detail
	sometime when you have more outputs (and thus more ranges).

1.02	Added ability to exclude patterns from training and/or
	application files (see notes under specfile option
	DAT:number_of_patterns_to_exclude )

1.03	very minor: changed output format of scale values from 
	decimal to exponential format. Inserted print option
	(for flag[14]>=4) to print mean and stdev as they are
	calculated.

2.00	Second hidden layer implemented