Statistics Playground |
Menu Start Research Publications Statistics playground Teaching CV Contact |
Here, I give an overview over my activities concerning statistics and methodology. Stochastic processes For an overview of my interests in stochastic processes see here. Total regression For an overview of fitting data when both axes x and y have measurement errors check my page on total regression here. Stable least-squares fitting For an overview of my interests in stable least-squares fitting see here. Figure 1. Estimating a Schechter luminosity function from just 100 galaxies! Shown is the likelihood manifold of the two model parameters of a Schechter function. The simulation used measurement errors of 0.3 magnitudes and is averaged over 100 noise realisations, i.e., error contours contain both estimation uncertainty and sampling/shot noise. The dashed orange lines indicate true values.Galaxy luminosity functions without binningThat project is very similar to the IMF project. In fact, the methodological problem is virtually identical. Only the astrophysical background is different. However, in this case, the method is already in application and will soon produce first astrophysical results. Figure 1 shows an example of estimating the Schechter luminosity function of only 100 galaxies. With binning, this would be a hopeless enterprise. Without binning, we obtain an unbiased estimate. Comparing population proportions, e.g., are galaxy mergers more frequent in active than in inactive galaxies? That is actually a fairly simple methodological problem that can be solved accurately, i.e., without any approximations. The trick is to interpret the merger events in active and inactive galaxies as Bernoulli trials. The comparison then works by convolving the beta distributions of both samples with each other. This simple methodological solution has already been applied in two publications and has spawned two further proposals for observation time. Currently, I am writing up an arxiv manuscript explaining the foundations of the method. It will also contain a complete set of code - ready for use. Figure 2. Three different types of one-dimensional 68% confidence intervals for an unfriendly likelihood function.Error estimationWhile reading through the astronomical literature, I encountered lots of adventurous methods to estimate errors of fit parameters. Therefore, I wrote a short arxiv manuscript how to do it properly, which is meant to help PhD students and others interested in this subject. Figure 2 shows a simple example. Monte-Carlo methods After computers became sufficiently powerful in the 1990s, MC methods experienced a hype. However, there is lots of mythology associated with these methods, too. In the course of my research, I worked into this subject from the perspective of parameter/error estimation. Here are some of my conclusions:
In the case of linear models and Gaussian data noise, the likelihood is a pure Gaussian with a single well-defined maximum. As soon as models become nonlinear or data noise non-Gaussian, that changes. Nevertheless, we would like to find the global optimum, instead of a local one. MCMC methods cannot do the trick, since they are started at a single location and are usually not run long enough to completely explore the likelihood manifold. I therefore worked a little bit with nested sampling algorithms. They have a better chance to identify the global optimum. However, their time complexity scales exponentially with number of model parameters - like a brute-force grid - because the number of required "livepoints" scales exponentially. Another approach to identify global optima are genetic algorithms, which mimic biological evolution via processes equivalent to mutation, reproduction and selection according to fitness. However, so far I had no time to play around with these methods. |