Pratical Course, SS2009
Faculty of Physics and Astronomy, University of Heidelberg


Statistical Methods (UKSta)

Block Course, 13-24 July 2009, Mon-Fri 09:00-13:00 (first day starts at 10:00)
CIP-Pool (Computer room) in the Kirchhoff Institute for Physics (Neuenheimer Feld)

Dozent: Dr. Coryn Bailer-Jones (Email: calj AT mpia.de)
Assistant: Markus Schmalzl

Overview
Prerequisites
Course formalities and registration
Syllabus
Books
R pages
R tutorials


Overview

This is an introductory statistics course for students in physics. It is a computer-based course, in which you will learn not only the principles and methods of statistical analysis, but will also put these into practice using a range of real-world data sets. The aim is to provide a basic understanding of data analysis using statistics and to provide tuition in using standard tools. Derivations will be avoided and mathematical theory will be kept to a minimum, although maths will be used to show the origin of and connections between the statistical methods and with probability. The course will be held in English, but discussions can be carried out in German.

Prerequisites

No knowledge of statistics beyond Abitur/high-school level is assumed. The course will make exclusive use of the R software package. Prior experience with this is not necessary, but you are advised to experiment with in preparation if at all possible. (It is freeware and easily be installed on Mac, linux and Windows computers.) The R web page gives links to manuals, tutorials, FAQs etc.

Course formalities and registration

This is a new course in the Heidelberg Physics Bachelor programme which will typically be taken in the third semester or later. You must be a matriculated member of Heidelberg university to participate. The course counts for 3 LP (Lesitungspunkte) and has an estimated workload of 90 hours, of which 50 hours are to be done as homework. (With 4 hours contact time per day, you should expect to spend around 3 hours per day in the afternoons on the homework, plus a bit more for the extended assignment at the end of the course.) For more details see the Physics BSc handbook (Modulhandbuch). (Note that the syllabus below supercedes that in the Modulhandbuch.)

There is no examination, but to obtain the LPs you must attend the whole course and complete and submit the homework.

Due to the limited number of computer consoles, the number of participants is limited to 50 (two people sharing each console). So to participate in the course you must register in advance by email. Please send me the following details:

Surname, Forename, Email address, Matriculation number, Semester you are in, 1-2 line summary of your post-school experience of statistics (e.g. courses taken)

Places will be allocated on a first-come first-served basis and the registration is binding. (If you have to cancel for reasons outside of your control, please inform me as early as possible so that I can inform someone on the waiting list.) I will contact you during the semester concerning your participation.

Syllabus

Each day will comprise three parts: (1) presentation and discussion of the homework from the previous day, plus discussion of any issues; (2) a lecture on a new topic; (3) computer-based exercises on the new topic. The homework will be a mixture of computer-based and paper-based work. Each of the ten days concerns a different topic. The script for each lecture can be downloaded by clicking on the lecture title. The full syllabus is listed in the first lecture. The homework exercises will be provided separately.

These lecture notes are no longer available. They are superceded by those used in the 2013 version of this course.

  1. Introduction
  2. Probability and distributions
  3. Hypothesis testing
  4. Estimation and errors
  5. Regression
  6. Hypothesis testing II
  7. Regression 2
  8. Binomial and Poisson processes
  9. Maximum likelihood and density estimation   (slides)
  10. Bayesian inference

Books

I recommend that you get hold of an introductory statistics text to use during this course. There are many around, varying in their scope, level, emphasis and quality. The course does not follow single book, but I provide a summary of a somewhat random sample. The course focuses on use of statistics in the physical sciences, so many even basic methods in the social sciences will not be covered: you may want to take this into account when buying a book. There are several texts which examine specifially the use of R in statistics, which is useful, although these tend to be bit too recipe-oriented to obtain a proper level of understanding. Some of these are briefly reviewed on the R web site.

Most of the bools listed below can be inspected on amazon.de and several are in the University Library.

Barlow, Statistics
This is a well-written introduction with some useful mathematical background and simple derivations and good descriptions. It is written for physics students, so it even has a chapter titled "Errors". I can recommend it if you want to go beyond just having recipes (which you should), in particular as it contains derivations which Crawlet, Everitt & Hothorn and Dalsgaard omit. Like most introductory statistics text books, it takes a very orthodox or frequentist approach (probability only appears in chapter 7!), which can make the different topics seem like set of disconnected techniques.

Crawley, Statistics. An Introduction using R
This text emphasises statistics for biological and to some extent physical (but not social) sciences. It has a reasonable balance between explaining the methods and demonstrating them in R. While there are examples, there is more of an emphasis on principles and the basic maths than there is in Everit & Hawthorn or Dalgard, for example. Indeed, the maths is very basic and many methods are not properly explained (the course will go beyond this level). However, it is visually appealing and has the advantage of being relatively cheap. Like most statistics books, it presents statistics in the traditional way (look at the Table of Contents),

Dalgaard, Introductory Statistics with R
An introduction to both R and statistics. The mathematical treatment is limited and it takes a somewhat "recipes"-like approach. As the title suggests, R takes a central role. Includes exercises and answers.

Everitt and Hothorn, A Handbook of Statistical Analyses using R
R takes quite a very central place, with lots of examples, data sets (and perhaps a few too-many screen dumps). As the title suggests, this is a guide to using R for statistics rather than a book from which you can learn statistics. Moreover, it covers several topics which are not typical for an introductory statistics course (and which we won't cover). It is as R-centric as Crawley and Dalgaard but a bit more advanced.

Maindonald and Braun, Data Analysis and Graphics using R
This is essentially a handbook for using R for statistical data analysis rather than a book from which to learn statistics. It is similar in approach and coverage to the clasic book of Venables & Ripley (see below), in that it also covers what one would call machine learning methods (e.g. trees, discriminant analysis), but at a slightly lower level. It contains very little mathematics. At 26cm x 18cm x 3.5cm, it won't fit in your pocket.

Sivia, Data Analysis. A Bayesian Tutorial
The first edition was an excellent introduction to data analysis in the Bayesian perspective. (A new second edition adds three more chapters.) I highly recommended it if really want to understand what statistics is and how it relates to probability theory, rather than just learn a bunch of frequentist recipes. (Don't look in here for p-values and Neyman-Pearson hypothesis testing ;-) ) However, it goes beyond the scope of the course and not much will be directly covered. It does not cover R or other packages.

Sachs, Angewandte Statistik. Methodensammlung mit R
A very detailed and mathematical introduction to statistics. It contains a lot more than you'll need for the course but the level of mathematics is not as high (or as offputting) as first appearances might suggest. R is used to illustrate the statistics (rather than the other way around, as is the case is some other books). With problems and solutions. Available online via the University Library (you can download the whole book as PDF). (Judging from the Forward, virtually no reference to Bayesian statistics and the lack of any reference to Richard Cox or Harold Jeffreys, this is an unashamedly frequentist approach to statistics.)

Toutenburg and Heumann, Deskriptive Statistik and Induktive Statistik
This pair of books - in German - gives a detailed introduction to statistics and R from a somewhat mathematical perspective. It goes into more theory and depth than you'll need for this course. Lots of examples and solutions.

Venables and Ripley, Modern Applied Statistics with S (MASS)
"S" is essentially just another name for R. This books provides a very good introduction to R and its use for both basic and advanced data analysis. However, it assumes the reader is already reasonably familar with the techniques, so this is not a book which can be used alone to learn basic statistics. It goes well beyond the course, covering also topics such as GLIMs, neural networks and spatial statistics. The accompanying R package "MASS" contains many functions which will be used in the course.

Verzani, Using R for introductory statistics
Quite R-oriented and rather (too) basic. It's essentially an R guide rather than a statistics text. Available online via the University Library as an e-book.

Some more Bayesian books are mentioned in the script for lecture 10.

R pages

R tutorials


Coryn Bailer-Jones, calj at mpia.de
Last updated July 2009