######################################################################
######################################################################
##                                                                  ##
##                                                                  ##
##                          dynet                                   ##
##                                                                  ##
##                a recurrent neural network                        ##
##              for modelling dynamical systems                     ##
##                                                                  ##
##                           by                                     ##
##                                                                  ##
##                    Coryn Bailer-Jones                            ##
##                                                                  ##
##                         22/05/98                                 ##
##                                                                  ##
##             email: calj@mpia-hd.mpg.de                           ##
##               www: http://wol.ra.phy.cam.ac.uk/calj/             ##
##                                                                  ##
##                                                                  ##
##        see the README file for disclaimer and warranty           ##
##        see the dynet_manual file for operational details         ##
##                                                                  ##
##        This file is copyright 1998 by C.A.L. Bailer-Jones        ##
##                                                                  ##
##                                                                  ##
######################################################################
######################################################################


FILE:		dynet_manual
DESCRIPTION:    operations manual for dynet
AUTHOR:		Coryn Bailer-Jones
LAST MOD DATE:	26/06/98


######################################################################
######################################################################
##                                                                  ##
##                   dynet operations manual                        ##
##                                                                  ##
######################################################################
######################################################################


This file provides the information required to use the dynet
software. It assumes an understanding of the principles behind dynet.
This manual is relevant to version 1.19 of the software.


######################################################################
#                            What dynet is                           #
######################################################################

dynet is a recurrent neural network for modelling dynamical systems by
means of discrete time measurement of temporal patterns produced by
the dynamical system. Its learning routine is fully recurrent, and can
be viewed as performing temporal interpolation of one or more temporal
patterns.


######################################################################
#                    Other sources of information                    #
######################################################################

A paper, Bailer-Jones & MacKay 1998, which explains the model in
detail, has been submitted to a journal, and will be available on my
web page once accepted. In the meantime, a short description is
available in the paper "Static and Dynamic Modelling of Materials
Forging", Bailer-Jones et al. 1998, available from
http://wol.ra.phy.cam.ac.uk/calj/ajiips98.html
I strongly recommend you read these papers.

The source code for dynet is reasonably well documented.

The conjugate gradient optimizer "macopt" has its own source of
documentation, which can be obtained from
http://wol.ra.phy.cam.ac.uk/mackay/c/macopt.html

Further enquiries can be addressed to the author at the email address
at the head of this file.  Related publications and other information
are available from the dynet web page
http://wol.ra.phy.cam.ac.uk/calj/dynet.html


######################################################################
#                        The dynet files                             #
######################################################################

Makefile	the makefile
README		disclaimer and warranty. Please read this before you
		proceed.
ansi/		a directory which contains macopt (David MacKay's
		conjugate gradient optimizer) and its ancillary
		files. This is the standard macopt distribution
		and has not been modified in any way.
dynet		the dynet executable
dynet.c		main dynet program
dynet.h		header files
dynet_manual	this file
dynetsubs.c	collection of dynet subroutines
		(the division of subroutines in this file and dynet.c is
		 not entirely clear anymore, although I'm not sure it
		 ever was)
ran1.c		random number generator
syn/		a directory containing the data files for a synthetic
		problem (see section "synthetic problem" below)
syn.26.spec	the specfile for the synthetic problem


######################################################################
#                           How to run dynet                         #
######################################################################

First unpack the tar file, dynet.tar. Type

tar xvf dynet.tar

This will create a directory called "dynet" which will contain all of
the files listed above.

dynet has been written in ANSI C and was developed initially on a SUN
platform and then continued under Linux. The executable included in
this release is an i386 binary, i.e. a Linux program. To get it to
run under any other flavour of UNIX you'll need to compile it. To
do this all you need to do, in theory, is type

make dynet

but I can't guarantee that this will work. You may well need to adjust
the makefile. I may well transport dynet to other UNIX platforms so it
would be worth getting in touch if you have problems.  The make will
give a few warnings, but these are not important and can be ignored.
Note that in one place dynet uses a UNIX specific command via the C
function "system" (although this only prints the date so it's hardly
crucial).

To run dynet, type

dynet specfile

where specfile is the only command line argument which dynet reads.
Typing "dynet -v" will give the version number only.
See the section "synthetic problem" for an example application.


######################################################################
#                            The specfile                            #
######################################################################

The specfile contains all of the relevant information on the
temporal pattern (tp) files, network architecture and training
details. specfiles should be given the ".spec" suffix.

The specfile is read by searching for strings, such as
"train_network?_(yes/no)", therefore these strings must not
be changed or they wil be ignored. The relevant input for each
string must be the next item on the same line, e.g.

train_network?_(yes/no)				yes

The items in parentheses "(yes/no)" indicate the choices available.
Note that any line in the specfile preceded by a "#" symbol will be
ignored. Inappropriate inputs which disobey the required value will be
flagged as errors by dynet. The only exception is when real values are
specified instead of integers, in which case only the integer part of
the number will be used (ANSI C %f to %d conversion). The order of the
input strings is arbitrary, although there are a few restrictions,
most of which are obvious and are indicated below.

The various input strings in the specfile are described below, along
with the possible choices (given in rounds parentheses if not part of
the relevant string already) and the default value (in square
parentheses). Note that some strings do not have default values: their
values must be specified or dynet will exit with an error
message. Note also that a few of the options have not yet been
implemented or fully de-bugged: these are flagged below.  Most of the
input strings are prefixed by a three letter code indicating to what
part of the code they are relevant:

NET  network architecture
TRN  training of network
GRD  gradient descent
MAC  macopt (conjugate gradient optimizer)
APP  application of network

Note that in earlier versions of this manual the state variables were
often referred to as "recurrent inputs".  Although I now use the term
"state variable", there may still be some reference to "recurrent
inputs" in the program source code. The terms are synonymous.

The possible entries in the specfile are now listed and discussed.


verbosity_level_(0/1/2/3/4)			[2]
 - Amount of output from the program, ranging from 0 (nothing apart
   from error messages) to 4 (lots of diagnostic stuff).  Level 2 is
   appropriate for normal running.

train_network?_(yes/no)				[no]
 - do you want to learn the weights from a given set of data.

apply_network?_(yes/no)				[no]
 - do you want to apply the network to a set of data.

NET:number_of_state_variables_(V)		(+ve integer) [no default]
 - total number of state variables

NET:number_of_measured_state_variables_(Vm)	(+ve integer) [no default]
 - number of state variables. Must have V>= Vm.  If V-Vm>0, then this
   is the number of "unmeasured" state variables (see BJM98).

NET:number_of_external_inputs_(X)		(integer >= 0) [no default]
 - number of external inputs (excludes input-hidden bias, which is
   automatically included)

NET:number_of_hidden_nodes_(H)			(integer >= 0) [no default]
 - number of hidden nodes in the one and only hidden layer (excludes
   hidden-output bias, which is automatically included). Note that you
   can set H=0, although I have no idea why you may want to do this.

NET:data_scaling_(none/var/maxmin/netsize)	[var]
 - the external inputs and state variables can, and should, be
   scaled. "var" separately scales each input and state variables to
   have zero mean and unit standard deviation (this is the recommended
   option). "maxmin" is not yet implemented.  The "netsize" option
   ensures that the sum input to the hidden layers does not grown with
   the number of inputs. The input-hidden transfer function is H =
   tanh (Hlam*S) where S is the sum over the product of each input and
   its associated input-hidden weight.  If the "netsize" option is
   used, Hlam is set to Hlam = 1/( sqrt((double)(Xsize+1+Vsize)). This
   is also included as part of the "var" scaling option.

NET:input_weight_file				(file name) [no default]
 - if dynet has already been trained and a weight file produced, this
   weight file can be read in using this option. This is used to
   continue training from a given set of weights, or if you just want
   to use these weights to evaluate state variable sequences for a
   given sequence of external inputs. Often you will train and apply
   in a single run, in which case this field does not need to be
   set. Weight files should be given the suffix ".wt".  See section
   below for details of file format.

TRN:output_weight_file				(file name) [dynet.wt]
 - The default weight file name is only there in case you forget to
   specify it yourself. If you just apply the network with a set of
   weights which you read in (using 'NET:input_weight_file') then this
   option will be ignored, i.e. the weights file will not be
   re-written.  Weight files should be given the suffix ".wt".  See
   section below for details of file format.

TRN:form_of_weight_init_(uniform/gaussian)	[uniform]
 - initial weights for the network are drawn from a uniform or a
   Gaussian distribution. (Actually, they can only be drawn from a
   uniform distribution as the Gaussian option has not been
   implemented).

TRN:initial_weight_range			(real value) [0.1]
 - scale of random distribution from which initial weights are
   drawn. If "uniform" distribution has been chosen, it will
   range from -wtrng to +wtrng, where wtrng is the value specified here.

TRN:random_number_seed				(integer value) [731]
 - used to seed selection of initial weights. 

TRN:optimization_method_(grd/macopt)		[macopt]
 - the weights can be optimised using the gradient descent method
   (grd) or a conjugate graident optimizer (macopt, written by David
   MacKay).  Both are implemented, but grd has not been fully tested.

TRN:update_method_(1/3/4)			[4]
 - With gradient descent, the weights can be updated in one of three
   ways:
   1. after each epoch of each temporal pattern file (this is like
      on-line learning or Real Time Recurrent Learning extended to
      multiple temporal patterns)
   3. after all epochs of each temporal pattern file
   4. after all epochs of all temporal pattern files (total batch)
   Although with macopt these options could also apply, only method
   no.4 has been implemented.

TRN:weight_decay_(none/default/list)		[none]
 - weight decay can be used to regularize the training procedure.
   1/sqrt(alpha) can be thought of as the standard deviation of the
   Gaussian prior over the weights (with zero mean).  The alpha
   parameters of the weight decay (see BJM98) can be set using the
   "list" option, or the default values can be used.  Note that the
   current version of dynet cannot learn the optium alpha values from
   the data. If the list option is used, the following four lines must
   be specified:
TRN:alpha_VH					(real >=0)
TRN:alpha_XH					(real >=0)
TRN:alpha_bH					(real >=0)
TRN:alpha_HY					(real >=0)
 - These are the alpha parameters for the state variable to hidden,
   external input to hidden, input bias to hidden, and hidden to
   output weights respectively. alpha values can be set to zero, e.g.
   if you only want to apply weight decay to some sets of weights.

TRN:use_beta_parameters?_(yes/no)		[no]
 - beta is the coefficient of the error term for each state variable.
   1/sqrt(beta) can be considered as the standard deviation of the
   noise in the state variables. Note that if you use scaling, then
   beta is on the scale of the scaled variables, not the raw values in
   the temporal pattern files.  If this option is set to "no", all of the
   beta values are set to the default value, BETADEF, which is
   1. Otherwise, the user must specify the beta values for the V
   state variabless on the next V lines. Thus is V=2 the next two
   lines would be:
TRN:beta					(real >=0)
TRN:beta					(real >=0)
   for the first and second state variables respectively. If the
   number of beta values specified is fewer than V, the remainder will
   set to the last value of beta given (I don't recommend you use
   this, but it's useful if your changing the number of unmeasured
   state variables and don't remember to alter the number of betas).
   If any beta value for any state variable is set to zero, then that
   state variable will not contribute anything to the error function.
   You can think of this as saying that the noise on this variable is
   infinite, so you don't care what it's value is. I don't know why
   you may want to do this, but the option is there.

TRN:number_of_temporal_pattern_files		(+ve integer)
 - if the value specified is P, then the next P lines must be the
   names of the P temporal pattern files to be used for training the
   network. The required format of these files is specified in the
   next section. The files should have the suffix ".tpin1".

GRD:number_of_iterations			(+ve integer)
 - if using gradient descent, this is the total number of training
   iterations which will be performed

GRD:learning_rate				(+ve real)
 - if using gradient descent, this is the learning rate (eta).

MAC:convergence_tolerance			(+ve real)
 - if using macopt, this is the gradient convergence tolerance.
   In other words, once the *square* of the gradient is less than
   this value, training will halt.

MAC:maximum_number_of_iterations		(+ve integer)
 - if using macopt, this is the maximum number of training iterations
   which will be performed.

MAC:perform_maccheckgrad?_(yes/no)		[no]
 - if using macopt, you can check that the gradient dynet is
   evaluating correct by using a routine in macopt which compares the
   analytic gradient with one calculated using first differences. This
   should only be necessary when debugging, but may be worth checking
   if dynet appears to be going wild.

MAC:maccheckgrad_tolerance			(real >=0) [0.000001]
 - to tolerance at which to check the gradient.

APP:plot_file_name				(file name)
 - when dynet is applied to a new set of data, the values of the
   state variable at the last epoch for each temporal pattern file
   are written to this file. See section below for details of file format.
   File should be given the ".dat" extension.

APP:include_v(t=0)_in_plot_file?_(yes/no)	[no]
 - allows you to also have the initial v values written to the ".dat"
   file specified by "APP:plot_file_name".

APP:write_tper_files?_(yes/no)			[no]
 - Select whether or not you want an error file for each temporal
   pattern. See section below for details of file contents.

APP:number_of_temporal_pattern_files		(+ve integer)
 - if the value specified is P, then the next P lines must be the
   names of the P temporal pattern files to which dynet is applied to
   give temporal sequences. The required format of these files is
   specified in the next section. The files should have the suffix
   ".tpin1".


######################################################################
#                           File formats                             #
######################################################################

Input files (user written):   tp input files: ".tpin1" or ".tpin2" 
Output files (dynet written): error files: ".err"
			      weight files: ".wt"
			      tp output files: ".tpot"
			      tp error files: ".tper"
			      final epoch files: ".dat"


Temporal Pattern input files (.tpin1 .tpin2)
--------------------------------------------

The temporal pattern input files contain the time series (temporal
patterns) which you wish dynet to model. They should be given
suffices ".tpin1" or ".tpin2" (see below for distinction).

An example of the file is as follows.

# dynet tpin file  - do not add or remove lines
# #############################################
# Vm (meas rec), X (ext input), epochs:
  2              2              11
# Data (epoch/recurrent/external):
  0  0.00   0.00000   0.00000  -0.74179   0.32059
  1  0.49   x         x         0.01926   0.53881
  2  0.34   x         x        -0.83042   0.50905
  3  0.24   x         x        -0.94573  -0.51070
  4  0.27   x         x        -0.68900  -0.14798
  5  0.18   x         x         0.88249  -0.59774
  6  0.03   x         x        -0.79563   0.91946
  7  0.21   x         x         0.37771  -0.45139
  8  0.14   x         x         0.32615  -0.88162
  9  0.30   x         x         0.16759   0.60141
 10  0.49   2.24016  -0.16262   0.85671   0.37204

The header must consist of five lines. The first three are comment
lines. The fourth line has three fields:
1. number of measured state variables, Vm
2. number of external inputs, X
3. number of epochs, N
The fifth line is a comment line.
The next N lines are the data at the N epochs. The first epoch sets
the initial conditions. There are 2+Vm+X columns.
1. epoch label, t (can be any number, e.g. consecutive integers to
   number lines or total elapsed time)
2. time step between the current epoch and the previous epoch, dt.
   It follows from this definition that dt at the initial epoch is not
   used by dynet, but it is convenient to set this to zero for clarity.
3. The next Vm columns are the values of the Vm state variables.  When
   training dynet, these form the target values used to define the
   error which is to be minimized. (The exception is the intial values
   of v, i.e. v(t=0).)  When applying dynet, we will typically only
   have v(t=0). If v is specified at additional epochs, these will not
   be used by dynet in the application phase. Whenever a v is not
   specified an "x" or "X" should be written. If ever any v at t=0 is
   not specified (i.e. an "x" written), dynet will set it to the
   default value VINITDEF, which is zero. This is necessary as we must
   always have an initial condition for a dynamical system.
   I RECOMMEND THAT YOU ALWAYS SPECIFY THE INITIAL VALUES OF V.
   (The VINITDEF value will not be subject to any scaling you are
   using in dynet, that is, *within the network* the initial values
   will be set to VINITDEF. If scaling is used these initial values
   will translate to other values in the output files. While I think
   the code will handle all this properly, I still strongly recommend
   for your own sanity to specify v(t=0), whether or not you used
   scaling.  Your specified values will, of course, be subject to any
   scaling you use.)
   In training dynet we will often only specify v at
   the initial and final epochs. However, if v is specified at
   intermediate epochs these values will also be used to define the
   minimization error, and thus help to learn the weights.
4. The next X columns are the values of the external inputs.  These
   must be defined at every epoch (a future development of dynet is
   intended in which this restriction will be lifted).

Note that the data at a single epoch is *not* an input--target pair.
The whole point of the dynamic model in dynet (which is a discrete
approximation of a first order PDE) is that x and v values at time t-1
produce a v at time t. Therefore, the target for x(t-1) and v(t-1) is
v(t), i.e. the value on the next line. It follows then that the
external inputs at the final epoch are not used by dynet. However,
they should not just be left as blanks. Just write in some number,
e.g. 0.  In a future version I'll allow you to write the conventional
"x" or "X".

The file suffix ".tpin2" is used for files in which the complete state
variable sequence is specified. The suffix ".tpin1" is used when only
the intial and final state variables are specified.

You can, of course, used log values as the inputs and state
variables. However, linear (non-logged) values for the dt steps must
be used. This is on account of the first order expansion of the Taylor
series which evaluates the next state variables on the bases of the
output (derivative) and previous state variables.


Error files (.err)
------------------

This is a dump of the nework error function and the error surface
gradient as a function of iteration number. It is currently only
produced when using macopt for training. The file name is specified by
the "TRN:error_file" string in the training file.

The file has 7 columns:

1. training iteration number
2. the likelihood error, lerr
3. the fractional contribution of lerr to the total error,
   i.e. lerr/toterr
4. the weight decay (regularization) error, werr
5. the fractional contribution of werr to the total error,
   i.e. werr/toterr
6. the total error, toterr = lerr + werr
7. the gradient, g. g = sqrt(gg), where gg is the squared gradient
   written by macopt (and written to STOUT when verbose>=2).

Note that the errors scale with the total number of targets defined in
the training data. The errors are also in terms of the scaled
variables internal to the program. The gradient has similar
dependencies. The data in this file is really intended for a
qualitative indiciation of how training proceeds, or for making
comparisons between different network models trained with identical
data sets.


Weight files (.wt)
------------------

The weight file is written by dynet after training, the file name
being specified by the specfile string "TRN:output_weight_file".  The
weight files can also be read in by dynet using the string
"APP:input_weight_file". A typical weight file is:

# dynet weights file - do not add or remove lines
# ###############################################
# V (tot state), Vm (meas state), X (ext input), H (hidden): (exc biases)
  2              2                2              8
# scaling type: 
var
# V (state variables) mean and stdev scaling factors:
  6.17275e-01  8.04589e-01
 -1.55049e+00  3.46766e+00
# X (external input) mean and stdev scaling factors:
 -6.07996e-01  6.13417e-01
  5.71975e-01  6.80077e-01
# Lambda scale parameter for hidden layer:
  0.44721
# wtVH (state-hidden weights):
 -0.56010  -1.29233   0.45621   0.66428   2.30999   0.48269  -0.15339  -1.21327 
 -1.02396   1.33431   1.21793   1.34536   2.67156  -1.26362   0.94542   0.34031 
# wtXH (input-hidden weights):
 -1.99026   1.60542   0.50664   1.57786   4.08565  -0.11660   2.82143  -0.85219 
  0.06155   1.16499   2.59686  -0.45088  -1.31578  -2.89263   0.31608  -0.22411 
  2.50053   1.29021   1.24943   0.87688  -2.49171  -2.27946   1.72011   1.44805 
# wtHY (hidden-output weights):
  2.07704   2.22658 
  0.56307   0.22885 
  1.47584  -2.00067 
 -0.10726   0.80079 
  1.28100   3.81347 
  1.06695   1.72487 
 -2.92203  -0.39190 
  2.28566   1.23744 
  0.84620  -0.79857

The first three lines are comments. The fourth contains four fields:
1. total number of state variables, V
2. number of measured state variables, Vm
3. number of external inputs, X
4. number of hidden nodes, H
The fifth line is a comment line.
The sixth line specifies the data scaling type. If this is "var", then
the next lines specify the scaling factors, as shown in the above
example.
If "var" or "netsize" scaling is given, the Hlam parameter is also
specified. 
The weights themselves are specified in three groups:
1. wtVH (state variable-hidden weights)
2. wtXH (input-hidden weights)
3. wtHY (hidden-output weights)


Temporal Pattern output files (.tpot)
-------------------------------------

For each temporal pattern input file to which dynet is applied, a
temporal pattern output file is produced. If the input file name is
TPFILE, the corresponding output file is called TPFILE.tpot.
The exception is when TPFILE has the suffix ".tpin1" or ".tpin2", in
which case this suffix is replaced with the ".tpot" suffix. dynet
tells you when it runs exactly what the tpot files will be called,
e.g.

syn.04.200.tpin1 -> syn.04.200.tpot
myfile -> myfile.tpot

A typical tpot file is

# dynet temporal pattern output file
# ##################################
# input   file = syn.04.600.tpin1
# weights file = syn.04d.wt
# V (tot state), Vm (meas state), epochs:
  2              2                11
# State variables (epoch/measured/unmeasured):
  0  -0.00000  -0.00000 
  1   0.46486  -0.19967 
  2   0.36306  -0.32085 
  3   0.57170  -0.39786 
  4   0.98769  -0.36904 
  5   1.20681  -0.35651 
  6   1.17218  -0.36505 
  7   1.32371  -0.34871 
  8   1.27403  -0.36157 
  9   1.29278  -0.28753 
 10   2.10707  -0.46257 

The first five lines are comment lines, which tell you what the
corresponding tpin file and weight files are. The sixth line has three
fields:
1. number of state variables, V
2. number of measured state variables, Vm
3. number of epochs, N
The seventh line is a comment line.

The next N lines are the values of the V state variables at the N
epochs. The first epoch is the initial conditions.
There are V+1 columns:
1. epoch label, t, numbering from 0 to N-1 inclusive.
The next Vm columns are the Vm measured state variables.
The next V-Vm columns are the unmeasured state variables.


Temporal Pattern error files (.tper)
------------------------------------

These are very similar to the tpot files, but with the targets and
some additional error information added.  The tper files are only
written if "APP:write_tper_files?_(yes/no)" is set to "yes" in the
specfile.  See the documentation above on the format of the tpot
files. A typical tper file is:

# dynet temporal pattern error file
# ##################################
# input   file = syn.04.600.tpin1
# weights file = syn.04d.wt
# V (tot state), Vm (meas state), epochs:
  2              2                11
# Measured (state/target/state-target/|diff/target|):
  0  -0.00000 -0.00000  0.00    0.00  -0.00000 -0.00000  0.00    0.00 
  1   0.46486  -------  0.42805 ----  -0.19967  ------- -0.28822 ---- 
  2   0.36306  -------  0.32624 ----  -0.32085  ------- -0.40940 ---- 
  3   0.57170  -------  0.53489 ----  -0.39786  ------- -0.48641 ---- 
  4   0.98769  -------  0.95088 ----  -0.36904  ------- -0.45759 ---- 
  5   1.20681  -------  1.17000 ----  -0.35651  ------- -0.44506 ---- 
  6   1.17218  -------  1.13536 ----  -0.36505  ------- -0.45360 ---- 
  7   1.32371  -------  1.28690 ----  -0.34871  ------- -0.43726 ---- 
  8   1.27403  -------  1.23721 ----  -0.36157  ------- -0.45012 ---- 
  9   1.29278  -------  1.25597 ----  -0.28753  ------- -0.37608 ---- 
 20   2.10707  2.24016 -0.13309 0.06  -0.46257 -0.16262 -0.29995 1.84 

The header (first seven lines) are the same as the corresponding tpot
file. The last N lines consist of 1+4*Vm columns. The first column is
t. There are then four columns for each state variable:
1. The state variable at that epoch (as given in tpot file), v
2. The corresponding target, t (as given in the tpin file). If no
   target was specified in the tpin file then "-------" will appear.
3. diff = v - t 
4. |diff/t|
If t=0, column 4 will show "Div0", to flag a divide by zero.
If t=diff=0, column 4 will show "0.00".


Final Epoch files (.dat)
------------------------

Intended for making a plot of predicted vs. measured for the final
epoch for all tp files. It consists of P lines, where P is specified
in the specfile with the string "APP:number_of_temporal_pattern_files".
There are 1+(2*Vm) columns.
The first column labels the patterns from 1 to P inclusive.
There are then two columns for each state variable:
1. The state variable at the final epoch (as given in the tpot file)
2. The corresponding target (as given in the tpin file). If no
   target was specified in the tpin file then "-------" will appear.

If "APP:include_v(t=0)_in_plot_file?_(yes/no)" was set to "yes" in the
specfile, then an additional Vm columns will be added on the right
which give the initial conditions for each pattern. If initial
conditions for any pattern were not specified then the default value,
VINITDEF, will be written.


######################################################################
#                        The synthetic problem                       #
######################################################################

The directory syn/ contains files for a synthetic problem.  It is the
same problem as discussed in Bailer-Jones & MacKay 1998.  It consists
of two external inputs, x1 and x2, and two state variables, v1 and v2.
The problem is:

dv1/dt = x1 - 2*v1 + 8*v2 - x1*v1
dv2/dt = x2 - 5*v1 +   v2 - x2*v2

The autonomous part of this dynamical system (that with the external
inputs set to zero) is a decaying harmonic oscillator, with period 1.0
and e^-1 damping timescale 2.0.

The files syn.26.000.tpin2 to syn.26.099.tpin2 are 100 instantiations
of this dynamical systems (i.e. 100 temporal patterns). In all cases
the x input sequences were generated from constrained random walks: x1
(x2) changes with a probability per unit time of 0.65 (0.999) by a
random amount uniformly distributed between -0.5 and +0.5 (-1 and +1).
The modulus of x is then taken to ensure a positive sequence.  The
initial v values were randomly selected from a uniform distribution
between -1 and +1. The sequences were simulated numerically between
t=0 and t=8 inclusive, and sampled with a constant epoch spacing of
dt=0.1.  Thus the files contain 81 lines.

As explained above, the corresponding .tpin1 files
(i.e. syn.26.000.tpin1 to syn.26.099.tpin1) are the same files but
with the state variable data removed at all but the initial and final
state variables removed. The files syn.26.200.tpin1 to
syn.26.299.tpin1 are the same temporal patterns but with the v1 state
variable removed completely. These can be used to test the performance
of dynet with an unmeasured state variable.

The specfile provided, syn.26a.spec, is set up to apply dynet to the
syn.26.000.tpin1 to syn.26.099.tpin1 files.  The weights file,
syn/syn.26a.wt, is the result of having trained dynet on the
syn.26.000.tpin1 to syn.26.049.tpin1 files: the screen dump from this
run can be seen in the file syn/syn.26a.out; the error file is
syn/syn.26a.err. Running dynet on the specfile as it stands will
produce syn/syn.26a.dat and the .tpot and .tper files for the temporal
patterns.  I suggest that you play with this synthetic problem to get
a feel for dynet and to get familiar with the specfile.


######################################################################
#                        The dynet program                           #
######################################################################

Most of what follows will not be required by the user, and has not
been written with the user in mind. It is also far from comprehensive.


Subroutines
-----------

The subroutines within dynet are ordered in the following manner:

Principal Control Routines	 dynettrain, dynetapply
Forward Pass Routines		 dynetloop
Gradient Descent Routines	 graddescent, updatewt
Macopt-relevant Routines	 callmacopt, dymacint, dymacfn
Gradient Evaluation Routines	 ederiv, cumederivs
Initialisation Routines		 dynetinit, dynetloopinit, dysyswtinit
Scaling Routines		 scalecalc, datascale, unscale
Input/Output Routines		 specread, dataread, writeweights,
				 evtpnewname

Some subroutines, in particular memory allocation/deallocations, and
the transfer and error functions, are in the dynetsubs.c file.


scalecalc(): Evaluates scaling for the data.  Options will be
implemented but the current procedure is to scale data to have zero
mean and unit variance. Although dynet deals with different patterns
which may show behaviour over very different time scales, we still
expect a given input variable to be of the same type for all
patterns. In particular we expect any input to be of the same scale
for all patterns. Therefore we only need one mean and one variance
parameter for any one input.

scalecalc(): This routine always calculates Hlam based on the size of
the network.  Therefore if you choose var or maxmin data scaling Hlam
will also be set.

datascale(): Scales the data using scaling factors calculated by
scalecalc() or read in from a file. This scales both the external
inputs and the state variables. Note that scaling of the latter
automatically scales the outputs (Y). No matter what size the
delta_time terms are, the Y values can accommodate this and keep the V
values in scale. This is because of the linear output transfer
function which gives the Y values an arbitrarily large dynamic
range. This has implications for weight decay as we don't want weight
decay to penalize large hidden-output weights just because, for
example, the time_deltas are very small (thus requiring the Ys and
hence the wtHY weights to be large).

What I call a single pattern is a single temporal pattern, i.e.  a
time sequence of external inputs and corresponding state variables.  I
deal with one pattern at a time, i.e. I evaluate the derivatives at
all of the epochs of a given pattern before moving onto the next
pattern. When I update the weights is decided in graddescent() where
there are the three update method available. When using macopt() only
the batch update method is implemented.

dysysinit(): Initializes the dynamic system. Initialisation must occur
when a new pattern is presented to the network for training.  All of
the _prev variables are initialized to zero.  If they need to be
changed then the default values are set in the header file.  However
this would surely only apply to v_prev and y_prev as I cannot see why
te initial weight derivatives should be anything but zero. The effects
of the data scaling must be considered when initialising the system.

Pointer arithmetic is done in callmacopt and dymacint to accommodate
the use of single offset vectors macopt. This potentially makes the
code less robust as it would screw up if the types for the weight
(wtvec) and error gradient (wt_grad) were ever changed (from the
present double type). However, the change required would simply be to
define wtvec and wt_grad as floats rather than doubles, which is
the kind of change one should make anyway if the types of a dependent
subroutine change.


Bayesian Aspects
----------------

The network incorporates basic Bayesian features via the noise terms
(beta parameters) and weight decay terms (alpha parameters).

There is a separate noise term for each state variable.
There are four classes of weight decay terms:

alpha[0]  state variable to hidden weights (VH)
alpha[1]  external input to hidden weights (XH)
alpha[2]  input bias to hidden weights
alpha[3]  hidden to output weights (inc. bias) (HY)

Given the argument in the "Scaling" section you may not want to use
the alpha[3] term: the scale of the delta(t) terms will influence your
choice of alpha[3].

Both the alpha and beta terms are used in the ederiv() subroutine for
calculating the total error derivative. Thus the ed terms returned by
this subroutine contain both the likelihood term (beta) and the prior
(alpha).

Note that the scale of the gradient which the network evaluates
depends on the scale of the following:

1. the input data
2. the weights
3. the Bayesian parameters alpha and beta

Provided you use some kind of data scaling, the first two are taken
care of. However, the third is not. In particular, using values of
alpha and beta which differ signficantly from unity will mean that the
macopt convergence tolerance will have to be changed.
Assuming beta >> alpha, then increasing beta by a factor of x will
require MAC:convergence_tolerance to be increased by a factor of x too.


Variable names
--------------

Vsize, Xsize, Hsize and Ysize are numbers of nodes in the state
variable layer, external input layer, hidden layer and output
layer respectively. These numbers do not include biases.
Corresponding vectors (v,x,h,y) start at 0.
Input layer bias is x[Xsize] and is introduced to network by adding
extra constant input to input vectors.
Hidden layer bias is h[Hsize] and is set to HBIAS.

counting variables with dedicated use:

p - pattern
t - epoch
k - state variable (V) node
l - external input (X) node
m - hidden node
n - output node
i,j - general node values in ederiv()

tar is generally a 3D array of type targets. Variable ordering is:

tar[p][t][k] (pattern,epoch,node)  1 <= p <= Npats	
				   0 <= t <  ntsteps[p]
				   0 <= k <  Vsize
tar is a structure with two fields. The first, .def, specifies whether
or not a target is specified. The second, .val, gives the value. If
tar[p][t][k].def is zero, then the target has not been specified by
the user, and you cannot expect tar[p][t][k].val to be meaningful.
If you use scaling, the values stored in the tar array will be
changed. If the initial tar values, tar[p][0][k], are not defined,
we use the default value VINITDEF to initialise the sequence. Note,
however, that this value is not written into tar[p][0][k], nor is
tar[p][t][k].def set to 1.