Inference may be defined as the process of drawing conclusions based on evidence and reasoning. It lies at the heart of the scientific method, for it covers the principles and methods by which we use data to learn about observable phenomena. This invariably takes place via models. Much of science is model-based, meaning that we construct a model of some phenomenon and use it to make predictions of the data we expect to observe under certain conditions. By comparing predictions with the actual data, we can determine how well the model explains the data and hence the phenomenon. This may lead us to reject entirely some models, to improve (and then reassess) others, and perhaps finally to declare one as the "best" model (so far). Models are constructed using accepted theoretical principles, prior knowledge and expert judgement. Inference is the process by which we compare the models to the data. This normally involves casting the model mathematically and using the principles of probability to quantify the quality of match.

By way of illustration, let us consider -- somewhat anachronistically -- models for the motion of the planets in the solar system as seen from the Earth's surface. Two prominent alternatives are (1) the geocentric model, in which the the Earth is considered fixed and the Sun, Moon and planets orbit around it, and (2) the heliocentric model, in which the Sun is considered fixed and the Earth and planets orbit around it (the Moon continues to orbit the Earth). Ignoring the specific details of the various flavours of these general models (Aristotelean, Ptolemaic, Copernican, Keplerian etc.), there is actually a form of equivalence of the two models, because one can be transformed into the other via a non-inertial coordinate transformation. But when defined in their natural coordinates, these two models look very different.

Inference can typically be divided into two parts: model fitting and model comparison. In one flavour of the geocentric model, the planets move on regular circular orbits with the Sun at the center. Several parameters describe the motion of each planet (radius, period, inclination, phase). *Model fitting* is the process by which the values of these parameters are determined from a set of observational data. As all data are noisy to a greater or lesser degree, this involves uncertainty, and this is best quantified using probability. That is, uncertainty in the data corresponds to uncertainty in the model parameters. The most general method of inference is to determine the probability distribution function (PDF), P(θ | D, M), where D denotes the data and θ the parameters of model M. This is generally a multidimensional PDF and cannot usually be evaluated analytically. The arsenal of inference includes numerous numerical tools for evaluating it, Markov Chain Monte Carlo (MCMC) methods being one of the most popular. This is time consuming, however, and approximate techniques for determining summary information (such as the mean and covariance) are sometimes employed. Depending on both the quantity and quality of the data, as well as the suitability of the model, P(θ | D, M) may be more or less peaked around a narrow range of parameters, indicating a well-determined, low uncertainty solution for that model.

We will usually want to know how good a model is, either in a general sense or at some specific fitted values of the parameters. This is actually an ill-posed question if we only consider one model, because we then have no alternative which could explain the data better, so any data must be ascribed to our one-and-only model. Fundamentally, therefore, we actually always compare models and try to identify the "best" one (according to some criteria). At the minimum we at least consider a "background" model as an implicit alternative. Taking the example of detecting an emission line in a spectrum where we have just one model for the location and shape of the line, an implicit alternative model is no line, e.g. just a constant spectrum. But often we will have other alternative models, e.g. with multiple lines, or lines with different shapes.

We rarely believe that our models are perfect, so it actually makes no sense to ask whether a particular model is the "correct" one. This is reinforced by the fact that data are noisy - have a random component which cannot be predicted by a model - so we expect some deviation between data and predictions.
Therefore, we can only quantify the *relative* quality of models (including a possible background model). We cannot establish the absolute quality of a model.
Incidentally, model comparison is often performed poorly in the literature, in part through the use of over-simplified, straw-man background models, the weakness of which helps to artificially promote almost any other model.

This brings us to the second pillar of inference, *model comparison*, the goal of which is to identify which of the models under consideration best explains the data. Returning to the example of planetary orbits, consider that we have a set of observations of planetary positions (two-dimensional sky-coordinates) at known dates. A good geocentric model can actually predict most possible observations very well; better, in fact, than a heliocentric model with circular orbits. More generally, because we can geometrically transform the predictions of a heliocentric model into those of a geocentric model, both models could be equally good at predicting these data. By making a geocentric model more complex (add more epicycles) we make make it fit the data better and better. (Think of fitting a curve to ten points in a two-dimensional space: Unless the points are collinear, a cubic curve can always be adjusted to fit better -- in terms of sum of square residuals -- than a straight line.) If we have additional reasons to prefer a geocentric model (such as lack of stellar parallaxes, Aristotle or biblical interpretations), then the geocentric model appears to be favoured.

Yet there is something important missing from this chain of reasoning. We know that increasingly complex models can be made to fit any data set, but we consider such models to be increasingly contrived.
Thus in addition to predictive power, consideration of the *plausibility* of models must also be a fundamental part of inference. Plausibility is often (but not always) equated with parsimony, in which case we adopt what is often called Occam's razor: we should prefer a simple solution when a more complicated one is not necessary. We should therefore apply some kind of "complexity control" to our models. In terms of the historical development of theories of planetary motion, the shift in preference from a geocentric to a heliocentric model was not just due to the improved data. It was also due to a increased willingness to question the assumption that the Earth could not move, as well as the choice to give more weight to the fact that the Sun's motion is suspiciously synchronized with the motion of the planets in the geocentric model. The former was a consequence of an intellectual revolution reaching far beyond astronomy; the latter was essentially a plausibility argument. Both demonstrate the unavoidable importance to the process of inference of prior information, that
information which goes beyond just the data we are explicitly using in the modelling.

The preceding description of inference is the Bayesian one. This is the only self-consistent, logical approach to inference based on probability. A probabilistic approach is essential, because dealing with observational data means dealing with uncertainty: Data are noisy (we cannot measure planetary positions exactly) and our samples are incomplete (we cannot measure all points in an orbit). Probability is arguably the most powerful means of dealing with uncertainty. Some scientists take issue with priors, but this does not to deny their existence. It just highlights the difficulty we face in practice of encapsulating prior information in terms of probabilities, which is a scientific problem to be tackled, not to be shied away from.

Bayesian inference has undergone a significant renaissance in astronomy in the past twenty years. This is partly due to the increase of available computing power, for one frequently has to perform high-dimensional numerical integration. Once methods for doing this became tractable, astronomers became aware of the need for self-consistent, logical data analysis, as opposed to quick, simple -- but often wrong -- statistical tests or recipes. Doing inference properly is a vitally important area in all of science, but in particular in astronomy where, lacking the ability to perform experiments or obtain in situ data, we are limited to remote observations. Vast amounts of money, time and effort are invested into building powerful instruments, so a commensurate effort should be made to ensure that sensible things are done with the data. Unfortunately this is not always the case, and many publications in the literature draw incorrect conclusions because of flawed inference.

This is not always due to ignorance or even lack of effort. The principles of inference may be straight forward, but the practice is considerably more complex: What models should I consider? How do I set up and parametrize these models? Which data do I take into account? What is the appropriate noise model? How do I define my priors and test the sensitivity to these? How do we measure model complexity or plausibility? How do I explore a high dimensional posterior PDF in an acceptable time scale? What new data should I acquire in order to help distinguish between models I have tested? How do I use the results of the analysis to improve the models or propose new ones? These are questions which can only be answered in the context of specific problems, and are the focus of applied research in inference.

Coryn Bailer-Jones

March 2012