##### ctsmod software, copyright 2012 Coryn A.L. Bailer-Jones ##### http://www.mpia.de/homes/calj ##### This software is released under the Creative Commons ##### Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license. ##### See the file "license" for more details ctsmod code versions All versions except v00 work. The date indicates when that version was archived. Results labelled as using version vNN could have had some or all of the new features listed for that version. v00 2011-09-05 NonOO skeleton code. Not even functioning v01 2011-09-14 Using MCMC to sample prior v02 2011-09-19 Direct prior sampling No shortcuts (other than shrinkage) in event likelihood integration v03 2011-09-24 Likelihood integration shortcut for TSMod2=ProbGauss added No constant offset in TSMod1 (i.e. multiple inheritance to achieve this not yet added) v04 2011-10-20 TSMod1=FuncCombUniformSinusoid added using multiple inheritance Transformation functions (for old MCMC sampling) commented out v05 2011-11-07 TSMod1=FuncCombUniformSigmoid and FuncCombUniformSigmoidSinusoid added (amp parameters renamed) Ability to read some specific files added shrink generalized to allow shrinkage on just t or z or both bug fix: eval.prior.theta() now set prior to 1 for any parameter where its alphas are NULL bug fix: simulated data generated too many events if Nevent=1 [had not effect so far] v06 2011-11-08 changed layout of calculation in TwoGauss without changing the mathematical result introduced new MeasMod TimeUniformSignalGauss. shrink.tzlim and shrink.tlim updated accordingly modified warning handling. Also now have option warnToFile write warnings to a file instead v07 2011-11-12 Neatified shrinking by replacing shrink.tzlim with shrink.tlim and shrink.zlim Introduced calc.event.like.negtsd() for doing the 0D integration of the event likelihood Added calculation of evidence of the no-model (special case of the above) Modified calc.event.like() to accommodate this; introduced integopt list Accommodated Nevsamp=1 and trapped lower values Improved in-code documentation, modified slightly some parameter checks v08 18-11-2011 Writing to connections inside foreach/dopar is not permitted, and has led to corruption of many .like.dat files. So I created an outer for loop in which the foreach loop is nested, controlled using Nouter and Nparallel. Nevsamp may be changed. Removed code introduced in v07 to permit Nevsamp=1 (now need >1, Nparallel too) flushFreq removed: flushing must be done after every foreach loop Corrected plot.1dpost() in utilities to (a) do correct normalization, and (b) plot the correct prior. (In v07 and earlier I was plotting the evCalc[,1], the total prior, rather than just the prior for the individual parameter. Note that plot.2dpost() still plots the total prior) Introduced new simulation class sim.asfile1() I realised on 18.11.2011 that FuncSinusoid (and FuncSigmoid) have a minimum signal value of zero, or equivalently a mean of amp/2. This is appropriate when the signal must be positive, but not when they have zero mean, which is what I have been assuming for the bdvar2 tests (which have been on light curves with a mean signal of zero: so all results so far are meaningless). v09 25-11-2011 Introduced the TSMod1 functions FuncSinusoidZero and FuncSigmoidZero, which are like the non-"Zero" functions, but now have a mean (over infinite time) of Zero. This addresses the problem mentioned under v08. Transformation functions (for old MCMC sampling) removed (not used since v01) v10 21-12-2011 Moved parts of calcEvidence.R into new functions in the file ctsmod_functions.R, specifically calc.like() dumpScreenDiagnostics() writeFinalOutputs() Introduced virtual class Bypass as a neater way of doing the TSMod1 and TSMod2 bypassing Created class BypassTSMod2 as a way of bypassing TSMod2 and deprecated using ProbGaussian for this purpose. Introduced TSMod2 functions ProbOUprocess and ProbIntOUprocess, as well TSMod1 function BypassTSMod1 (Note, however, that these are the "naive/weak" way of modelling the OU process) Introduced sim.IntOUprocess() Introduced the 0D integration function calc.event.like.zerobothsd() Changed the class "FuncUniform" to have two alternative prior PDFs, Gamma and Gaussian. Selection is made using new slot "priorform", which in turn is set by passing a string when initializing with new(), for which an "initialize" method has been added. v11.0 none This is nominally the same as v11, but before I added the offset parameter to ProbOUprocess v11 13-01-2012 ProbOUprocess from v10 renamed ProbOUprocNaive ProbIntOUprocess from v10 renamed ProbIntOUprocNaive Implemented the improved way of modelling OU process using the new TSMod2 class ProbOUprocess, and introduced the corresponding likelihood function calc.like.OUprocess(). The previous function calc.like() renamed calc.like.isolated(), and decision to call this or calc.like.OUprocess() made in a few function calc.like(). Generic function eval.mod() given the "..." argument to accommodate eval.mod.ProbOUprocess method. tzlim testing in calcEvidence.R harmonized. Removed unused generic function set.mod() (probably introduced in v10) Changed sim.OUprocess() to optionally add signal noise Changed sim.const() Archived immediately prior to introducing a new top-level caller, and making calcEvidence.R just one of the options to call v12 22-01-2012 Introduced MCMC for posterior sampling, resulting in major restructuring. Old calcEvidence.R split into two files: - run_ctsmod.R (to be sourced), which calls things, open/closes files and checks parameters - setup_ctsmod.R (source by run_ctsmod.R), where the user defines the run parameters - evidence-specific part, calc_evidence.R (called by run_ctsmod.R, one function) Added new file, sample_posterior.R, which contains several functions to call the MCMC, calculate the posterior PDF, and calculate the DIC: sample.posterior(), metrop(), calc.post.mcmc(), make.covariance.matrix(), calc.dic() (Note that calc.dic was actually wrong) metrop() is my own MCMC algorithm. The MCMC requires transforming and inverse transforming the TSModX@thetaX variables, which is achieved by new methods trans.theta and invtrans.theta in ctsmod_functions.R for all classes except the ProbOUprocNaive and ProbIntOUprocNaive (as I don't expect to use them at the moment) ctsmod_functions.R split into three files - ctsmod_models.R containing the classes and methods for the time series and measurement models - ctsmod_likefunctions.R containing all the likelihood evaluations functions - ctsmod_genfunctions.R containing everything else From the old ctsmod_function.R I renamed dumpScreenDiagnostics() to dump.evidence.diagnostics(), and writeFinalOutputs() to write.final.evidence.outputs(), and moved both into calc_evidence.R. From the first of these function I removed the general time calculation parts into a new function est.time(), now in ctsmod_genfunctions.R In utilities.R I added a new function calc.xic to calculate the AIC and BIC using the results of the evidence calculation (i.e. max likelihood found within evCalc) - this is not very accurate, however v13 23-03-2012 Changed calc.post.mcmc() to return logPrior and logLike in addition to parameter samples. calc.dic() was therefore simplified (and error in v12 corrected) to not recalculate likelihoods. Introduced k-fold CV functions in kfoldCV.R containing kfold.cv(), define.partitions() This required that calc.like() take an extra variable, selInd, indicating for which rows of obsdata I calculate the likelihood. obsdata[selInd,] is passed to calc.like.isolated(), but I must still pass all the data to calc.like.OUprocess() as this needs all data to calculate an OU process; but the likelihood is returned only for the events in selInd. It also requires me to use %dorng% in doRNG{} rather than %dopar% in doMC{}. calc.post.stats() in utilities.R modified to be more general to be able to work with either evCalc or postSamp. calc.post.stats.ev() added for the former. calc.post.stats.kfolcv() added for dealing with kfoldCV (very simple changes required, but also made slightly simplier/clearer in how it deals with hfac and modevidence) plot.postSamp1d() in utilities.R modified to permit passing of either postSamp or kfoldCV. It was further modified to allow prior samples to be passed too and these overplotted. oplot.OUprocess.data() in utilities.R modified to take selInd as additional argument. logEvNomod renamed to logLikeNomod and calculated independently of evidence. Introduced hfac intro evidence calculation, removed write.final.evidence.outputs() as its functionality now simplified and in run_ctsmod() Introduced automatic setting of hfac into functions in utilities.R in the case that user sets hfac=NA (which is also the default). Introduced invtrans.multi() into sample_posterior.R, as a way of avoiding what turned out to be a very low loop (for large Npostsamp), for doing the inverse transformation of postSamp. Introduced print.elapsed.time() into ctsmod_genfunctions.R to replace final duration printing and use this now in several places. This archive includes a version of sample_posterior.R called sample_posterior_withnotes.R which has some extra notes in it concerning the changes made v14 25-04-2012 In calc.event.like() in ctsmod_likefunctions.R, the condition for calling calc.event.like.negtsd was changed to include TSMod2=BypassTSMod2 and not to include TSMod3=ProbUniform. The function itself was also changed to calculate the total sd (sdTot) correctly and differently for the two valid choices of TSMod2, i.e. ProbGaussian and BypassTSMod2 This version was used for the results in the ctsmod_aa paper (A&A 2012) v15 04-09-2012 ProbWienerprocess introduced. This is very similar to ProbOUprocess. To accommodate this calc.like.OUprocess() in ctsmod_likefunctions was renamed calc.like.FullyStochasticprocess(), as it can be used exactly as is for both. The name of the setup file (default "setup_ctsmod.R") can now be specified by the user or provided on the command line. Otherwise the default is used. Added the following error trap into metrop() in sample.posterior.R if(is.nan(logMR)) {stop("...")} which can occur if the "prior" (as determined by alpha) for a fixed variable is zero. This can be/should be avoided by not setting alpha for fixed variables, or if you do, then setting the fixed value to be compatible with the prior (i.e. to give non-zero). In sample.posterior.R changed one argument in the sample.prior.theta() call (which is used to check parameters). What was "theta1Fixed" is now "assign.partial(TSMod1@theta1, theta1Fixed)", and likewise for TSMod2,3. This is now the same as we have in calc_evidence.R Added TSMod1 FuncCombUniformSigmoidZeroSinusoidZero Added option to send email when job finishes Added code in run_ctsmod.R and setup_ctsmod.R to specify recontructing (predicting) time series at specified times and signal values. Code to do this is in TSreconstruct.R. It works, but is very slow, so has been commented out. Trapped having Ncores negative or non-finite (as Inf caused a crash) Error corrected in calc.event.like() in ctsmod_likefunctions.R where we had "TwoGauss" rather than "TwoGaussian" as a class name Error corrected in calc_evidence.R where I assigned thetaAdj: Old: thetaAdj <- as.numeric( c(TSMod1@theta1, TSMod2@theta2, TSMod3@theta3)[thetaFlag] ) New: thetaAdj <- as.numeric( c(TSMod1@theta1, TSMod2@theta2, TSMod3@theta3)[names(which(thetaFlag))] ) names(which(thetaFlag)) selects the adjustable theta *in the order of thetaFlag*. In the old version it just came out in the order which were defined by the sample.prior.theta method. thetaAdj was used to fill evCalc, but the column names were set to those TRUE in thetaFlag. So if thetaFlag didn't happen to be in the same order as that used in sample.prior.theta, the columns were incorrectly labelled. This does not affect the evidence calculation, but it may mean the posterior/prior plots were incorrectly labelled for some parameters. Error corrected in four methods of FuncCombUniformSigmoidZeroSinusoidZero in ctsmod_models.R. The four methods eval.prior.theta, sample.prior.theta, trans.theta, invtranstheta were using methods which did not exist, things such as eval.prior.theta.FuncSigmoidZero(). We should instead be using the "non-Zero" versions, e.g. eval.prior.theta.FuncSigmoid(). For these four functions the methods are identical (which is why they are not defined - it's just eval.mod() which is different). With the class FuncSigmoidZero, this inherited these methods from FuncSigmoid, so they did not have to be defined. But for FuncCombUniformSigmoidZeroSinusoidZero this is not possible, because it uses multiple inheritance, so we have to specify the combinations of the methods. v16 17-09-2012 Corrected error in sample_posterior.R, in which we had integout instead of integopt in a couple of places. first integopt was nonetheless being correctly passed into calc.like() - presumably via the default scoping public protocol. release Introduced a new level of parallel processing in run_ctsmod.R, which will run the kfold-CV in parallel with the full data posterior sampling, if both are selected. This uses the multicore package, functions mcparallel() and collect() Introduced parallel tempering Metropolis sampler, pt.metrop(), into sample_posterior.R. This is selected if the new setup parameter Nchain>1, in which case Nchains are used, with a swapping internval specified by swapInt, both defined in the setup file (setup_ctsmod.R). In coordination with introduction of pt.metrop(), sample_posterior() now returns a two-element list: the matrix postSamp (which was previously returned directly) and the 3D array postSampAll, which contains the samples for all chains. Added code to run_ctsmod.R to calculate the thermodynamic integration estimation of the evidence using postSampAll (if it exists) Introduced a trap into metrop() in sample_posterior.R to avoid possibility of proposals with -Inf logPrior or logLike being accepted. calc.post.mcmc() in sample_posterior.R now gives a warning if logPrior is not finite. This could occur if a prior was set up to give exactly zero, which is possible, but not desirable, as it may well indicate an unreasonable value of alpha for a fixed parameter or an incompatibility between the prior definition and the posterior sampling. Removed (commented out) the during-processing writing of the evidence to an output file, from run_ctsmod.R and calc_evidence.R Removed from ctsmod_genfunctions.R the special function for reading the particular paleontology excel file, and the call to that in read.obs.dat(). Now all files need to be converted to standard ASCII format in advance. Library gdata() therefore no longer required. v17 10-10-2012 sumres() added to analyse.R to ease collation of results Added a speed-up to metrop() and pt.metrop() in sample_posterior.R: if sampleCov is diagonal then generate random numbers using rnorm (Ntheta independent Gaussians) rather than rmvnorm(). Is about 8% faster. Modified read.obs.dat() in ctsmod_genfunctions.R to now read either a three or four columns file. If it's three column, s.sd is assumed missing and is added as a column of zeros. Added function variable obsdataHeader which indicates whether or not we should skip the first (non-commented) line. That is, we no longer treat the first line of the data file as the correct header. This variable has been added to setup_ctsmod.R (default is TRUE). Added another function variable versbose to supress function printing if it is FALSE (default is TRUE). Added pass variable plotResid to oplot.OUprocess.data() and modified code to plot residuals rather than data and model, if this flag is TRUE (default is FALSE) Introduced the linear model: Added new TSMod1 class FuncLinear to ctsmod_models.R Added sim.linear() function into ctsmod_genfunctions.R (and recognition of it to run_ctsmod.R) Added relevant prior setting lines to setup_ctsmod.R Changed error message in calc.post.mcmc() in sample_posterior.R (as zero prior for the initial parameter value of slope is now possible) In utilities.R: (1) modified plot.postSamp1d() to apply xlim also applied to priorSamp before density estimation. (This was necessary to get samples from a Cauchy to plot properly: a Cauchy has such heavy trails, that a few far out points dominate the density estimation, giving an erroneous uniform distribution.) (2) modified oplot.model.data() to acommodate FuncLinear v18 24-01-2013 Added sim.quasisinusoid() and sim.asfuncquasisinusoid() to ctsmod_genfunctions.R, run_ctsmod.R and setup_ctsmod.R Added new TSMod1 class FuncQuasiSinusoid to ctsmod_models.R as well as its derivatives FuncQuasiSinusoidZero, FuncCombUniformQuasiSinusoid, FuncCombUniformQuasiSinusoidZero Modified oplot.model.data() in utilities.R to be able to overplot FuncQuasiSinusoid models Trapped undefined function for obssel in run_ctsmod.R Updated ctsmod_tn.tex to reflect these changes, and added a new subsection on error reporting v19 20-03-2013 Added emcee() as a new sampler. This is in the new mcmc.R file, to which also metrop() and pt.metrop() have been moved here from sample.posterior(). The three MCMC samplers are independent of ctsmod. Changed sample.posterior() in sample_posterior.R to inverse transform postSampAll (before just postSamp was transformed). Also, postSampAll is set to NULL rather than NA if it is not filled (which is only the case for the Metropolis algorithm) Modified kfold.cv() in kfoldCV.R to be compatible with emcee. Specifically, I had to replace Npostsamp with Nsamp, where Nsamp=ncol(postSamp) For the model FuncQuasiSinusoid (defined in ctsmod_models.R) I trapped the error that a value of fpX very close to of equal to +1 or -1 gets tranformed to +Inf and -Inf respectively in the transformation trans.theta.FuncQuasiSinusoid() used to project into the space for the MCMC. This would not immediately give an error, but when inversed transformed by invtrans.theta.FuncQuasiSinusoid() +Inf gets transformed to NaN, and this of course causes an error later (specifically splint() in eval.mod.FuncQuasiSinusoid() fails). I now limit the value of fpX prior to the first transformation to be between 1-translim and 1+translim, where translim=1e-10 (hard-wired into function). Likewise, in invtrans.theta.FuncQuasiSinusoid() I now prevent the transformed version of fpX dropping below the corresponding limit (-23.719), as the inverse transformation would (for much lower values) give a NaN. Changed the transformation of phase in FuncSinusoid (and thus its derivatives) to be circular, which is also what it is in FuncQuasiSinusoid Modified calc.post.mcmc() to now avoid evaluating the functions (TSMod) if logPrior is not finite. logLike is then set to -Inf (which is fine, as the MCMC sample is anyway rejected if logPrior=-Inf). This was done because emcee sometimes resulted in very large values of parameters which then produced numerical errors in the function evaluations. calc.dic() no longer called by run_ctsmod.R (and therefore not used). I'm not using it, and it might cause the error just mentioned (which wouldn't be trapped by that fix, as calc.dic() does not calculate the prior). Corrected calc.post.samp() in utilities.R. Up until now it was incorrectly using the posterior probabilities to form a weighted mean and sd of the parameters. But as these have already been sampled from the posterior, no weighting was necessary. No weighting is now used. Function modified to use importance sampling to calculate statistics correctly for samples drawn from the prior when new is.ev parameter is set to true. For the same reason the function calc.post.samp.ev() has been deleted. This led to hfac no longer being necessary in calc.post.stats() or find.peak.sol.kfoldcv() Updated a number of functions in utilities.R to accept the Nrej and Nthinblock parameters. The former permits a post-run burnin period to be applied (i.e. removed this number of initial samples from the analysis). The latter accommodates blocking of the thinning, specifically designed to apply thinning in the case that postSamp was produced by emcee (so thinning is applied to each walker separately). Added recalc.like() to utilities.R Updated ctsmod_tn.tex to reflect these changes, added summary of functions in utilities.R, and reorganization of text v20 08-04-2013 Corrected emcee() in mcmc.R to do asynchronous (rather than synchronous) updating of the walkers. v19 sometimes crashed with emcee because the transformed phase parameter grew to very large values after thousands of iterations and so the cosine function in, e.g. FuncSinusoid, gave an error. I fixed this by removing the circular boundary condition on phase, and strictly imposing the evaluation of the prior on the phase, such that proposals from MCMC which lie outside the range 0-1 give logPrior = -Inf and so are always rejected, and thus the function never evaluated. This modification was also made to the function FuncQuasiSinusoid() (also in ctsmod_models.R). However, very large values of the transformed values for any parameter could still cause emcee to crash, because then any one of the proposed parameters, thetaProp (which is a vector), could evaluate to NaN (e.g. -1e+320 - -1e+320 = NaN). I have therefore added into emcee() a test for any thetaProp being non-finite (which includes NaN), in which case the proposal is rejected. Note that this is an intrinsic problem of using emcee with transformed parameters, in which very large (positive or negative) of the transformed parameters are not rejected by a prior. Added options("warn"=1) to run_ctsmod.R to force warnings to be written as they occur. Removed the now redundant if(warnToFile) {flush(outWarn)} in calc.evidence() and kfold.cv(), and also removed the "immediate." from some warnings Modified the warning message in calc.post.mcmc() in sample_posterior.R to show the parameter estimates which have caused logPrior=-Inf, but this is not commented out, because it occurs very often now that the MCMC can legitimately generate samples of the phase parameter outside the range 0-1 (and we can also get samples-to-reject like this with FuncLinear). Modified plot.postSamp1d() in utilities.R to also calculate the 1D mode for each parameter, and to calculate the mean and sd of these across partitions for K-fold CV results. Removed commented out, obsolete code related to printing evidence calculations to a file v21 22-04-2013 A poor initialization of parameters could produce -Inf or +Inf for the function evaluations in any of the MCMC algorithms. This is now trapped by throwing an error in all MCMC algorithms. Note that the Gaussian sampling using sampleSD in emcee of the phase parameter can still produce values of phase outside the range 0-1, which would give func=-Inf at initialization and so trigger this error. sampleSD must be kept small when used in this way. Additional error trapping added to calc.post.mcmc() in sample_posterior.R to ensure that a stop() is forced if logPrior or logLike are NULL, NA, NaN, or +Inf. Package multicore replaced with parallel, and therefore parallel() and collect() with mcparallel() and mccollect() Nominally works with R-3.0.0 This version is for preservation prior to making changes to how I do parallel processing. v22 13-06-2013 Uses R-3.0.0 Replaced use of foreach and doRNG with mcparallel from package parallel in kfoldcv(). This was done nominally in order to catch possible errors in the code related to very large values of the parameters encountered with emcee. However, this has not fixed the mysterious occasional NULL return bug from sample.posterior(), described in section D.10 of the ctsmod technical note. See that section for a description of the behaviour. Replaced package multicore with package parallel in run_ctsmod.R (used to run 1-fold and K-fold in parallel). Modified plot.emcee.walkers() in utilities.R to plot walker samples for user-selected partition in kfoldCV results Modified plot.postSamp1d() in utilities.R to perform the density estimation taking into account periodic boundary conditions for parameters selected by pbound. Modified definition of p in find.peak.sol() in utilities.R to be parameter number, rather than column number in samp (p_new = p_old + 2) Corrected bug in find.peak.sol.kfoldcv() which meant kfoldCV parition was not being passed to find.peak.sol() (this would have invisibly passed postSamp, if defined, or crashed, if not). v23 24-10-2015 R-3.1.2 Added the necessary Jacobian for the parameter transformations to the Metropolis, metrop(), parallel tempering, pt.metrop(), and EMCEE, emcee(), algorithms. Added the function calc.jacobian.mcmc() to sample_posterior.R to compute the total Jacobian (a scalar) using the appropriate methods for the function classes (which were added to ctsmod_models.R). Extended error trapping in algorithms in mcmc() to trap NA and NaN (and not just +/- Inf) in initial function values (shouldn't be necessary, but just in case)