##### ctsmod software, copyright 2012 Coryn A.L. Bailer-Jones
##### http://www.mpia.de/homes/calj
##### This software is released under the Creative Commons 
##### Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license.
##### See the file "license" for more details

ctsmod code versions

All versions except v00 work.
The date indicates when that version was archived. Results labelled as using version vNN
could have had some or all of the new features listed for that version.

v00	2011-09-05	NonOO skeleton code. Not even functioning
v01	2011-09-14	Using MCMC to sample prior
v02	2011-09-19	Direct prior sampling
			No shortcuts (other than shrinkage) in event likelihood integration
v03	2011-09-24	Likelihood integration shortcut for TSMod2=ProbGauss added
			No constant offset in TSMod1 (i.e. multiple inheritance to achieve this not yet added)
v04	2011-10-20	TSMod1=FuncCombUniformSinusoid added using multiple inheritance
			Transformation functions (for old MCMC sampling) commented out
v05	2011-11-07	TSMod1=FuncCombUniformSigmoid and FuncCombUniformSigmoidSinusoid added 
			  (amp parameters renamed)
			Ability to read some specific files added
			shrink generalized to allow shrinkage on just t or z or both
			bug fix: eval.prior.theta() now set prior to 1 for any parameter where its alphas are NULL
			bug fix: simulated data generated too many events if Nevent=1 [had not effect so far]
v06	2011-11-08	changed layout of calculation in TwoGauss without changing the mathematical result
			introduced new MeasMod TimeUniformSignalGauss. shrink.tzlim and shrink.tlim updated accordingly
			modified warning handling. Also now have option warnToFile write warnings to a file instead
v07	2011-11-12	Neatified shrinking by replacing shrink.tzlim with shrink.tlim and shrink.zlim
			Introduced calc.event.like.negtsd() for doing the 0D integration of the event likelihood
			Added calculation of evidence of the no-model (special case of the above)
			Modified calc.event.like() to accommodate this; introduced integopt list
			Accommodated Nevsamp=1 and trapped lower values
			Improved in-code documentation, modified slightly some parameter checks
v08	18-11-2011	Writing to connections inside foreach/dopar is not permitted, and has led to
			  corruption of many .like.dat files. So I
			created an outer for loop in which the foreach loop is nested, controlled using
			  Nouter and Nparallel. Nevsamp may be changed.
			Removed code introduced in v07 to permit Nevsamp=1 (now need >1, Nparallel too)
			flushFreq removed: flushing must be done after every foreach loop
			Corrected plot.1dpost() in utilities to (a) do correct normalization, and (b) 
			  plot the correct prior. (In v07 and earlier I was plotting the evCalc[,1], the
			  total prior, rather than just the prior for the individual parameter. Note
			  that plot.2dpost() still plots the total prior)
			Introduced new simulation class sim.asfile1()
			I realised on 18.11.2011 that FuncSinusoid (and FuncSigmoid) have a minimum signal 
			  value of zero, or equivalently a mean of amp/2. This is appropriate when the signal 
			  must be positive, but not when they have zero mean, which is what I have been assuming 
			  for the bdvar2 tests (which have been on light curves with a mean signal of zero: so all 
			  results so far are meaningless).
v09	25-11-2011	Introduced the TSMod1 functions FuncSinusoidZero and FuncSigmoidZero, which are like the
			  non-"Zero" functions, but now have a mean (over infinite time) of Zero. This addresses
			  the problem mentioned under v08.
			Transformation functions (for old MCMC sampling) removed (not used since v01)
v10	21-12-2011	Moved parts of calcEvidence.R into new functions in the file ctsmod_functions.R, specifically
			   calc.like()
			   dumpScreenDiagnostics()
			   writeFinalOutputs()
			Introduced virtual class Bypass as a neater way of doing the TSMod1 and TSMod2 bypassing
			Created class BypassTSMod2 as a way of bypassing TSMod2 and deprecated using 
			  ProbGaussian for this purpose.
		        Introduced TSMod2 functions ProbOUprocess and ProbIntOUprocess, 
			  as well TSMod1 function BypassTSMod1
			(Note, however, that these are the "naive/weak" way of modelling the OU process)
			Introduced sim.IntOUprocess()
			Introduced the 0D integration function calc.event.like.zerobothsd()
			Changed the class "FuncUniform" to have two alternative prior PDFs, Gamma and Gaussian.
			  Selection is made using new slot "priorform", which in turn is set by passing a string
			  when initializing with new(), for which an "initialize" method has been added.
v11.0	none		This is nominally the same as v11, but before I added the offset parameter to ProbOUprocess
v11	13-01-2012	ProbOUprocess    from v10 renamed ProbOUprocNaive
		  	ProbIntOUprocess from v10 renamed ProbIntOUprocNaive
			Implemented the improved way of modelling OU process using the new TSMod2 class
			  ProbOUprocess, and introduced the corresponding likelihood function calc.like.OUprocess().
			The previous function calc.like() renamed calc.like.isolated(), and decision to call this or
			  calc.like.OUprocess() made in a few function calc.like().
			Generic function eval.mod() given the "..." argument to accommodate eval.mod.ProbOUprocess 
			  method.
			tzlim testing in calcEvidence.R harmonized.
			Removed unused generic function set.mod() (probably introduced in v10)
			Changed sim.OUprocess() to optionally add signal noise
			Changed sim.const()
			Archived immediately prior to introducing a new top-level caller, and making calcEvidence.R
			  just one of the options to call
v12	22-01-2012	Introduced MCMC for posterior sampling, resulting in major restructuring.
			Old calcEvidence.R split into two files:
			  - run_ctsmod.R (to be sourced), which calls things, open/closes files and checks parameters
			  - setup_ctsmod.R (source by run_ctsmod.R), where the user defines the run parameters
			  - evidence-specific part, calc_evidence.R (called by run_ctsmod.R, one function)
			Added new file, sample_posterior.R, which contains several functions to call the MCMC,
			calculate the posterior PDF, and calculate the DIC:
			  sample.posterior(), metrop(), calc.post.mcmc(), make.covariance.matrix(), calc.dic()
			  (Note that calc.dic was actually wrong)
			metrop() is my own MCMC algorithm.
			The MCMC requires transforming and inverse transforming the TSModX@thetaX variables, which is
			  achieved by new methods trans.theta and invtrans.theta in ctsmod_functions.R for all
			  classes except the ProbOUprocNaive and ProbIntOUprocNaive (as I don't expect to use
			  them at the moment)
			ctsmod_functions.R split into three files
			  - ctsmod_models.R  containing the classes and methods for the time series and measurement models
			  - ctsmod_likefunctions.R  containing all the likelihood evaluations functions
			  - ctsmod_genfunctions.R  containing everything else
		        From the old ctsmod_function.R I renamed
			  dumpScreenDiagnostics() to dump.evidence.diagnostics(), and
			  writeFinalOutputs() to write.final.evidence.outputs(),
			  and moved both into calc_evidence.R. From the first of these function I removed the 
			  general time calculation parts into a new function est.time(), now in ctsmod_genfunctions.R
			In utilities.R I added a new function calc.xic to calculate the AIC and BIC using the
			  results of the evidence calculation (i.e. max likelihood found within evCalc) - this
			  is not very accurate, however
v13	23-03-2012	Changed calc.post.mcmc() to return logPrior and logLike in addition to parameter samples.
			calc.dic() was therefore simplified (and error in v12 corrected) to not recalculate likelihoods.
			Introduced k-fold CV functions in kfoldCV.R containing
			  kfold.cv(), define.partitions()
			This required that calc.like() take an extra variable, selInd, indicating for which rows 
			  of obsdata I calculate the likelihood. obsdata[selInd,] is passed to calc.like.isolated(), 
			  but I must still pass all the data to calc.like.OUprocess() as this needs all data to 
			  calculate an OU process; but the likelihood is returned only for the events in selInd.
			  It also requires me to use %dorng% in doRNG{} rather than %dopar% in doMC{}.
			calc.post.stats() in utilities.R modified to be more general to be able to work with either
			  evCalc or postSamp. calc.post.stats.ev() added for the former. calc.post.stats.kfolcv()
			  added for dealing with kfoldCV
			  (very simple changes required, but also made slightly simplier/clearer in how it deals
			   with hfac and modevidence)
			 plot.postSamp1d() in utilities.R modified to permit passing of either postSamp or kfoldCV.
			   It was further modified to allow prior samples to be passed too and these overplotted.
			 oplot.OUprocess.data() in utilities.R modified to take selInd as additional argument.
			 logEvNomod renamed to logLikeNomod and calculated independently of evidence.
			 Introduced hfac intro evidence calculation, removed write.final.evidence.outputs()
			   as its functionality now simplified and in run_ctsmod()
			 Introduced automatic setting of hfac into functions in utilities.R in the case that user
			   sets hfac=NA (which is also the default).
			 Introduced invtrans.multi() into sample_posterior.R, as a way of avoiding what turned out
			   to be a very low loop (for large Npostsamp), for doing the inverse transformation
			   of postSamp.
			 Introduced print.elapsed.time() into ctsmod_genfunctions.R to replace final duration printing
			   and use this now in several places.
			 This archive includes a version of sample_posterior.R called sample_posterior_withnotes.R which
			   has some extra notes in it concerning the changes made
v14	25-04-2012	 In calc.event.like() in ctsmod_likefunctions.R, the condition for calling calc.event.like.negtsd
			   was changed to include TSMod2=BypassTSMod2 and not to include TSMod3=ProbUniform.
			   The function itself was also changed to calculate the total sd (sdTot) correctly and 
			   differently for the two valid choices of TSMod2, i.e. ProbGaussian and BypassTSMod2
			   This version was used for the results in the ctsmod_aa paper (A&A 2012)
v15	04-09-2012	 ProbWienerprocess introduced. This is very similar to ProbOUprocess.
			 To accommodate this calc.like.OUprocess() in ctsmod_likefunctions was renamed 
			   calc.like.FullyStochasticprocess(), as it can be used exactly as is for both.
			 The name of the setup file (default "setup_ctsmod.R") can now be specified by the user or
			   provided on the command line. Otherwise the default is used.
			 Added the following error trap into metrop() in sample.posterior.R
  			   if(is.nan(logMR)) {stop("...")}
			   which can occur if the "prior" (as determined by alpha) for a fixed variable is zero.
			   This can be/should be avoided by not setting alpha for fixed variables, or if you do, then
			   setting the fixed value to be compatible with the prior (i.e. to give non-zero).
			 In sample.posterior.R changed one argument in the sample.prior.theta() call (which is used to check
			   parameters). What was "theta1Fixed" is now "assign.partial(TSMod1@theta1, theta1Fixed)", and likewise
			   for TSMod2,3. This is now the same as we have in calc_evidence.R
			 Added TSMod1 FuncCombUniformSigmoidZeroSinusoidZero
			 Added option to send email when job finishes
			 Added code in run_ctsmod.R and setup_ctsmod.R to specify recontructing (predicting) time series at 
			   specified times and signal values. Code to do this is in TSreconstruct.R. It works, but is very slow,
			   so has been commented out.
			 Trapped having Ncores negative or non-finite (as Inf caused a crash)
			 Error corrected in calc.event.like() in ctsmod_likefunctions.R where we had "TwoGauss" rather than
			   "TwoGaussian" as a class name
			 Error corrected in calc_evidence.R where I assigned thetaAdj:
			   Old:  thetaAdj <- as.numeric( c(TSMod1@theta1, TSMod2@theta2, TSMod3@theta3)[thetaFlag] )
			   New:  thetaAdj <- as.numeric( c(TSMod1@theta1, TSMod2@theta2, TSMod3@theta3)[names(which(thetaFlag))] )
			   names(which(thetaFlag)) selects the adjustable theta *in the order of thetaFlag*. In the old version it
			   just came out in the order which were defined by the sample.prior.theta method.
			   thetaAdj was used to fill evCalc, but the column names were set to those TRUE in thetaFlag.
			   So if thetaFlag didn't happen to be in the same order as that used in sample.prior.theta, the columns
			   were incorrectly labelled. This does not affect the evidence calculation, but it may mean the
			   posterior/prior plots were incorrectly labelled for some parameters.
			 Error corrected in four methods of FuncCombUniformSigmoidZeroSinusoidZero in ctsmod_models.R.
			   The four methods eval.prior.theta, sample.prior.theta, trans.theta, invtranstheta were using methods
			   which did not exist, things such as eval.prior.theta.FuncSigmoidZero(). We should instead be using 
			   the "non-Zero" versions, e.g. eval.prior.theta.FuncSigmoid(). For these four functions the methods
			   are identical (which is why they are not defined - it's just eval.mod() which is different).
			   With the class FuncSigmoidZero, this inherited these methods from FuncSigmoid, so they did not
			   have to be defined. But for FuncCombUniformSigmoidZeroSinusoidZero this is not possible, because
			   it uses multiple inheritance, so we have to specify the combinations of the methods.
v16	17-09-2012	 Corrected error in sample_posterior.R, in which we had integout instead of integopt in a couple of places.
	first		   integopt was nonetheless being correctly passed into calc.like() - presumably via the default scoping
	public		   protocol.
	release		 Introduced a new level of parallel processing in run_ctsmod.R, which will run the kfold-CV in parallel
			   with the full data posterior sampling, if both are selected. This uses the multicore package, 
			   functions mcparallel() and collect()
			 Introduced parallel tempering Metropolis sampler, pt.metrop(), into sample_posterior.R. This is selected
			   if the new setup parameter Nchain>1, in which case Nchains are used, with a swapping internval
			   specified by swapInt, both defined in the setup file (setup_ctsmod.R).
			 In coordination with introduction of pt.metrop(), sample_posterior() now returns a two-element list:
			   the matrix postSamp (which was previously returned directly) and the 3D array postSampAll, which 
			   contains the samples for all chains.
			 Added code to run_ctsmod.R to calculate the thermodynamic integration estimation of the evidence using
			   postSampAll (if it exists)
			 Introduced a trap into metrop() in sample_posterior.R to avoid possibility of proposals with -Inf
			   logPrior or logLike being accepted.
			 calc.post.mcmc() in sample_posterior.R now gives a warning if logPrior is not finite. This could occur
			   if a prior was set up to give exactly zero, which is possible, but not desirable, as it may well
			   indicate  an unreasonable value of alpha for a fixed parameter or an incompatibility between 
			   the prior definition and the posterior sampling.
			 Removed (commented out) the during-processing writing of the evidence to an output file, from
			   run_ctsmod.R and calc_evidence.R
			 Removed from ctsmod_genfunctions.R the special function for reading the particular paleontology
			   excel file, and the call to that in read.obs.dat(). Now all files need to be converted to standard
			   ASCII format in advance. Library gdata() therefore no longer required.
v17	10-10-2012	 sumres() added to analyse.R to ease collation of results
			 Added a speed-up to metrop() and pt.metrop() in sample_posterior.R: if sampleCov is diagonal
			   then generate random numbers using rnorm (Ntheta independent Gaussians) rather than rmvnorm().
			   Is about 8% faster.
			 Modified read.obs.dat() in ctsmod_genfunctions.R to now read either a three or four columns file.
			   If it's three column, s.sd is assumed missing and is added as a column of zeros. Added function
			   variable obsdataHeader which indicates whether or not we should skip the first (non-commented)
			   line. That is, we no longer treat the first line of the data file as the correct header.
			   This variable has been added to setup_ctsmod.R (default is TRUE).
			   Added another function variable versbose to supress function printing if it is FALSE (default
			   is TRUE).
			 Added pass variable plotResid to oplot.OUprocess.data() and modified code to plot residuals
			   rather than data and model, if this flag is TRUE (default is FALSE)
			 Introduced the linear model:
			   Added new TSMod1 class FuncLinear to ctsmod_models.R
			   Added sim.linear() function into ctsmod_genfunctions.R (and recognition of it to run_ctsmod.R)
			   Added relevant prior setting lines to setup_ctsmod.R
			   Changed error message in calc.post.mcmc() in sample_posterior.R (as zero prior for the initial
			   parameter value of slope is now possible)
			 In utilities.R: (1) modified plot.postSamp1d() to apply xlim also applied to priorSamp before density 
			   estimation. (This was necessary to get samples from a Cauchy to plot properly: a Cauchy has such
			   heavy trails, that a few far out points dominate the density estimation, giving an erroneous
			   uniform distribution.) (2) modified oplot.model.data() to acommodate FuncLinear
v18	24-01-2013	 Added sim.quasisinusoid() and sim.asfuncquasisinusoid() to ctsmod_genfunctions.R, run_ctsmod.R
			   and setup_ctsmod.R
			 Added new TSMod1 class FuncQuasiSinusoid to ctsmod_models.R as well as its derivatives
			   FuncQuasiSinusoidZero, FuncCombUniformQuasiSinusoid, FuncCombUniformQuasiSinusoidZero
			 Modified oplot.model.data() in utilities.R to be able to overplot FuncQuasiSinusoid models
			 Trapped undefined function for obssel in run_ctsmod.R
			 Updated ctsmod_tn.tex to reflect these changes, and added a new subsection on error reporting
v19	20-03-2013	 Added emcee() as a new sampler. This is in the new mcmc.R file, to which also metrop() and pt.metrop()
			   have been moved here from sample.posterior(). The three MCMC samplers are independent of ctsmod.
			 Changed sample.posterior() in sample_posterior.R to inverse transform postSampAll (before just postSamp 
			   was transformed). Also, postSampAll is set to NULL rather than NA if it is not filled (which is only
			   the case for the Metropolis algorithm)
			 Modified kfold.cv() in kfoldCV.R to be compatible with emcee. Specifically, I had to replace Npostsamp
			   with Nsamp, where Nsamp=ncol(postSamp)
			 For the model FuncQuasiSinusoid (defined in ctsmod_models.R) I trapped the error that a value of fpX
			   very close to of equal to +1 or -1 gets tranformed to +Inf and -Inf respectively in the transformation
			   trans.theta.FuncQuasiSinusoid() used to project into the space for the MCMC. This would not immediately give
			   an error, but when inversed transformed by invtrans.theta.FuncQuasiSinusoid() +Inf gets transformed to
			   NaN, and this of course causes an error later (specifically splint() in eval.mod.FuncQuasiSinusoid() fails).
			   I now limit the value of fpX prior to the first transformation to be between 1-translim and 1+translim,
			   where translim=1e-10 (hard-wired into function). Likewise, in invtrans.theta.FuncQuasiSinusoid() I now
			   prevent the transformed version of fpX dropping below the corresponding limit (-23.719), as the inverse
			   transformation would (for much lower values) give a NaN.
			 Changed the transformation of phase in FuncSinusoid (and thus its derivatives) to be circular, which is also
			   what it is in FuncQuasiSinusoid
			 Modified calc.post.mcmc() to now avoid evaluating the functions (TSMod) if logPrior is not finite. 
			   logLike is then set to -Inf (which is fine, as the MCMC sample is anyway rejected if logPrior=-Inf).
			   This was done because emcee sometimes resulted in very large values of parameters which then produced
			   numerical errors in the function evaluations.
			 calc.dic() no longer called by run_ctsmod.R (and therefore not used). I'm not using it, and it might
			   cause the error just mentioned (which wouldn't be trapped by that fix, as calc.dic() does not calculate 
			   the prior).
			 Corrected calc.post.samp() in utilities.R. Up until now it was incorrectly using the posterior probabilities
			   to form a weighted mean and sd of the parameters. But as these have already been sampled from the posterior,
			   no weighting was necessary. No weighting is now used. Function modified to use importance sampling to 
			   calculate statistics correctly for samples drawn from the prior when new is.ev parameter is set to true.
			   For the same reason the function calc.post.samp.ev() has been deleted.
			   This led to hfac no longer being necessary in calc.post.stats() or find.peak.sol.kfoldcv()
			 Updated a number of functions in utilities.R to accept the Nrej and Nthinblock parameters. The former
			   permits a post-run burnin period to be applied (i.e. removed this number of initial samples from the 
			   analysis). The latter accommodates blocking of the thinning, specifically designed to apply thinning
			   in the case that postSamp was produced by emcee (so thinning is applied to each walker separately).
			 Added recalc.like() to utilities.R
			 Updated ctsmod_tn.tex to reflect these changes, added summary of functions in utilities.R,
			   and reorganization of text
v20	08-04-2013	 Corrected emcee() in mcmc.R to do asynchronous (rather than synchronous) updating of the walkers.
			 v19 sometimes crashed with emcee because the transformed phase parameter grew to very large values
			   after thousands of iterations and so the cosine function in, e.g. FuncSinusoid, gave an error. 
			   I fixed this by removing the circular boundary condition on phase, and strictly imposing the
			   evaluation of the prior on the phase, such that proposals from MCMC which lie outside the range 0-1
			   give logPrior = -Inf and so are always rejected, and thus the function never evaluated.
			   This modification was also made to the function FuncQuasiSinusoid() (also in ctsmod_models.R).
			   However, very large values of the transformed values for any parameter could still cause emcee to
			   crash, because then any one of the proposed parameters, thetaProp (which is a vector),
			   could evaluate to NaN (e.g. -1e+320 - -1e+320 = NaN). I have therefore added into emcee() a test for 
			   any thetaProp being non-finite (which includes NaN), in which case the proposal is rejected. 
			   Note that this is an intrinsic problem of using emcee with transformed parameters, in which very 
			   large (positive or negative) of the transformed parameters are not rejected by a prior.
			 Added options("warn"=1) to run_ctsmod.R to force warnings to be written as they occur.
			 Removed the now redundant if(warnToFile) {flush(outWarn)} in calc.evidence() and kfold.cv(), and
			   also removed the "immediate." from some warnings
			 Modified the warning message in calc.post.mcmc() in sample_posterior.R to show the parameter
			   estimates which have caused logPrior=-Inf, but this is not commented out, because it occurs
			   very often now that the MCMC can legitimately generate samples of the phase parameter outside
			   the range 0-1 (and we can also get samples-to-reject like this with FuncLinear).
			 Modified plot.postSamp1d() in utilities.R to also calculate the 1D mode for each parameter,
			   and to calculate the mean and sd of these across partitions for K-fold CV results.
			 Removed commented out, obsolete code related to printing evidence calculations to a file
v21	22-04-2013	 A poor initialization of parameters could produce -Inf or +Inf for the function evaluations
			   in any of the MCMC algorithms. This is now trapped by throwing an error in all MCMC 
			   algorithms. Note that the Gaussian sampling using sampleSD in emcee of the phase parameter 
			   can still produce values of phase outside the range 0-1, which would give func=-Inf at 
			   initialization and so trigger this error. sampleSD must be kept small when used in this way.
			 Additional error trapping added to calc.post.mcmc() in sample_posterior.R to ensure that
			   a stop() is forced if logPrior or logLike are NULL, NA, NaN, or +Inf.
			 Package multicore replaced with parallel, and therefore parallel() and collect() with 
			   mcparallel() and mccollect()
			 Nominally works with R-3.0.0
			 This version is for preservation prior to making changes to how I do parallel processing.
v22	13-06-2013	 Uses R-3.0.0
			 Replaced use of foreach and doRNG with mcparallel from package parallel in kfoldcv(). 
			   This was done nominally in order to catch possible errors in the code related to very 
			   large values of the parameters encountered with emcee. However, this has not fixed the 
			   mysterious occasional NULL return bug from sample.posterior(), described in section D.10 
			   of the ctsmod technical note. See that section for a description of the behaviour.
			 Replaced package multicore with package parallel in run_ctsmod.R (used to run 1-fold
			   and K-fold in parallel).
			 Modified plot.emcee.walkers() in utilities.R to plot walker samples for user-selected partition
			   in kfoldCV results
			 Modified plot.postSamp1d() in utilities.R to perform the density estimation taking into
			   account periodic boundary conditions for parameters selected by pbound.
			 Modified definition of p in find.peak.sol() in utilities.R to be parameter number, rather than
			   column number in samp (p_new = p_old + 2)
			 Corrected bug in find.peak.sol.kfoldcv() which meant kfoldCV parition was not being passed to
			   find.peak.sol() (this would have invisibly passed postSamp, if defined, or crashed, if not).
v23	24-10-2015	 R-3.1.2
			 Added the necessary Jacobian for the parameter transformations to the Metropolis, metrop(),
			   parallel tempering, pt.metrop(), and EMCEE, emcee(), algorithms. Added the function
			   calc.jacobian.mcmc() to sample_posterior.R to compute the total Jacobian (a scalar) using
			   the appropriate methods for the function classes (which were added to ctsmod_models.R).
			 Extended error trapping in algorithms in mcmc() to trap NA and NaN (and not just +/- Inf)
			   in initial function values (shouldn't be necessary, but just in case)