Discovery of the Higgs boson? What p-values mean

On 4 July 2012, CERN announced that they had discovered, at a significance of 5-sigma, a new particle with a mass consistent with that predicted for the Higgs boson. Leaving aside the issue of whether this means this particle is the Higgs boson predicted by the standard model, or some other Higgs boson, or even some other boson, what does "a significance of 5-sigma" mean? This is a p-value for the null hypothesis that there is no new particle. That is, under some (complex) model which does not include this new particle, we have a some prediction - with noise - of the data one would expect to get. The p-value is the probability of observing the data which were observed or something more extreme, assuming this null hypothesis to be true. A p-value of 5-sigma refers to a Gaussian distribution, and means the probability of finding a value at 5-sigma or more away from the mean (which is the null hypothesis prediction). Using a two-sided test (i.e. could be above or below the mean), this corresponds to a probability of 5.7e-7, or 1 in 1.7 million. (This does not necessarily imply that the data used in this test actually have a Gaussian distribution: they might have worked out the p-value and then converted this into a Gaussian equivalent.)

The most important thing to realise about this statement is that this probability is not the probability that the null hypothesis is true. It is just the probability of getting the data observed or something more extreme, assuming the null hypothesis (no new particle) is true. If we take this result as a suggestion that that null hypothesis is not true, then to say anything more we would have to define an alternative hypothesis and find out the probability of the data under that. Only if these data are more probable (under this alternative) can we say that we have evidence in favour of this alternative over the null. (Of course, this tells us nothing about other possible alternatives.)

Note the curious dependence of p-values on "more extreme data". The definition of the p-value depends on this nebulous concept. (it's forced to use it because any specific set of real-valued data has an infinitesimally small probability). For one thing we should ask ourselves why a test depends on data we did not observe. Second, if the data have been heavily processed to come up with a p-value (and that is often the case), it might be hard to know what "more extreme" data actually are.

A low p-value is at best an indication that there may be a better explanation for the data than the null hypothesis. But to know this for sure, you need to define and test the alternative hypothesis. Orthodox hypothesis testing does not do this, because it doesn't actually test the alternative. (One could run lots of tests with different definitions of the null and the alternative, but it is surely better to treat all hypotheses equally.)

To improve on this we could do Bayesian model comparison, for example using the Bayes factor (the ratio of the evidences for pairs of models). This doesn't depend on "more extreme data" and it explicitly compares various alternative models on an equal footing. (There are alternative Bayesian-consistent methods too, such as cross validation.) In the case of the Higgs boson, I would have thought this is possible, as the standard model makes predictions of the Higgs properties.

I am not doubting the sigificance or importance of the CERN claim. But we do need to understand what that claim is, and we need to realise that there are other ways to analyse it.

Coryn Bailer-Jones
July 2012