Dozent: Dr. Coryn Bailer-Jones (Email: calj AT mpia.de)
Assistant: Ronald Läsker
This course will this time be held on ten half days (Friday 09:00-13:00) during the summer semester, starting on Friday 20 April (the exact dates are listed below). Prior registration is necessary.
Overview
Prerequisites
Course formalities and registration
Syllabus
Textbooks
(Semi-)popular books
Podcasts
R stuff
This is an introductory statistics course for students in physics. It is a computer-based course, in which you will learn not only the principles and methods of statistical analysis, but will also put these into practice using a range of real-world data sets. The aim is to provide a basic understanding of data analysis using statistics and to provide tuition in using standard tools. Derivations will be avoided and mathematical theory will be kept to a minimum, although maths will be used to show the origin of and connections between the statistical methods and with probability. The course will be held in English, but discussions can be held in German.
No knowledge of statistics beyond Abitur/high-school level is assumed. The course will make exclusive use of the R software package. Prior experience with this is not necessary, but you are strongly advised to get familiar with it before the start of the course. (It is freeware and easily be installed on Mac, linux and Windows computers.) The R web page gives links to manuals, tutorials, FAQs etc.
This course is most appropriate for students entering their third semester (or later). You must be a matriculated member of Heidelberg university to participate. The course counts for 3 LP (Lesitungspunkte) and has an estimated workload of 90 hours, of which 50 hours are to be done as homework. For more details see the Physics BSc handbook (Modulhandbuch). (Note that the syllabus below supercedes that in the Modulhandbuch.)
There is no examination. To obtain the LPs you must attend the whole course and complete/submit all of the homework completed to a sufficiently high standard.
Due to the limited number of computer consoles, the number of participants is limited to 25 (one person per console). In order to participate in the course you must register in advance by email. Please send me the following details:
Surname, Forename, Email address, Matriculation number, Semester you are in, 1-2 line summary of your post-school experience of statistics (e.g. courses taken)
Places will be allocated on a first-come first-served basis and the registration is binding. (If you have to cancel for reasons outside of your control, please inform me as early as possible so that I can inform someone on the waiting list.) I will not acknowledge registrations, so assume you that have a place if you do not hear from me.
Each day will comprise three parts: (1) presentation and discussion of the homework from the previous day, plus discussion of any issues; (2) a lecture on a new topic; (3) computer-based exercises on the new topic. The homework will be a mixture of computer-based and paper-based work. Each of the ten days concerns a different topic. After each lecture, the script/notes will be availble. The homework exercises will be provided separately.
These lecture notes are no longer available. They are superceded by those used in the 2013 version of this course.
The course will take place on the following dates
I recommend that you get hold of an introductory statistics text to use during this course. There are many around, varying in their scope, level, emphasis and quality. The course does not follow single book, but I provide a summary of a somewhat random sample. The course focuses on use of statistics in the physical sciences, so many even basic methods in the social sciences will not be covered: you may want to take this into account when buying a book. There are several texts which examine specifially the use of R in statistics, which is useful, although these tend to be bit too recipe-oriented to obtain a proper level of understanding. Some of these are briefly reviewed on the R web site.
Most of the books listed below can be inspected on amazon.de and several are in the University Library.
Barlow, Statistics
A classic. This is a well-written introduction with some useful mathematical
background and simple derivations and good descriptions. It is written
for physics students, so it even has a chapter titled "Errors". I can
recommend it if you want to go beyond just having recipes (which you
should), in particular as it contains derivations which Crawley,
Everitt & Hothorn and Dalsgaard omit. Like most introductory
statistics text books, it takes a very orthodox or frequentist
approach (probability only appears in chapter 7!), which can make the
different topics seem like set of disconnected techniques. The book also demonstrates a lack of understanding of Bayesian statistics.
Crawley, Statistics. An Introduction using R
This text emphasises statistics for biological and to some extent
physical (but not social) sciences. It has a reasonable balance
between explaining the methods and demonstrating them in R. While
there are examples, there is more of an emphasis on principles and the
basic maths than there is in Everit & Hawthorn or Dalgard, for
example. Indeed, the maths is very basic and many methods are not
properly explained (the course will go beyond this level). However, it
is visually appealing and has the advantage of being relatively cheap.
Like most statistics books, it presents statistics in the traditional
way (look at the Table of Contents),
Dalgaard, Introductory Statistics with R
An introduction to both R and statistics. The mathematical treatment is limited
and it takes a somewhat
"recipes"-like approach. As the title suggests, R takes a central role.
Includes exercises and answers.
Everitt and Hothorn, A Handbook of Statistical Analyses using R
R takes quite a very central place, with lots of examples, data sets
(and perhaps a few too-many screen dumps). As the title suggests, this
is a guide to using R for statistics rather than a book from which you
can learn statistics. Moreover, it covers several topics which are not
typical for an introductory statistics course (and which we won't
cover). It is as R-centric as Crawley and Dalgaard but a bit more
advanced.
Gregory, Bayesian logical data analysis for the physical sciences
A good introduction to both the principles and practical application of Bayesian methods. One of very very books giving a broad introduction and guide for physical scientists (there are lots more such books for social scientists and specific analytic models). He uses Mathematica to illustrate the method. If you only look at one book on Bayesin methods, look at this one.
Jaynes, Probability theory
E.T. Jaynes was one of the main proponents of Bayesian inference. This is a a rather unconventional book describing numerous elements of Bayesian probability theory and inference, ranging from the basics through pratical examples to funadamental philosophical discussions. This book is unconventional and even polemical in places, and is probably not appropriate for a first exposure to Bayesian inference. But it contains some very thought-provoking discussions.
Mackay, Information theory, inference and learning algorithms
Not a traditional statistics book, and not a first book for learning Bayesian inference from, but a great book for learning about inference both in principle and in practice. He has a great didactic style, and this book contains some very illuminating examples. Also look here for a good introduction to MCMC. Mackay and CUP have done us a great service by making the book available online.
Maindonald and Braun, Data Analysis and Graphics using R
This is essentially a handbook for using R for statistical data
analysis rather than a book from which to learn statistics. It is
similar in approach and coverage to the clasic book of Venables &
Ripley (see below), in that it also covers what one would call machine
learning methods (e.g. trees, discriminant analysis), but at a
slightly lower level. It contains very little mathematics. At 26cm x
18cm x 3.5cm, it won't fit in your pocket.
Sivia, Data Analysis. A Bayesian Tutorial
The first edition
was an excellent introduction to data analysis in the Bayesian
perspective. (A new second edition adds three more chapters.) I
recommended it if really want to understand what statistics is and how
it relates to probability theory, rather than just learn a bunch of
frequentist recipes. That is, don't look in here for p-values and
Neyman-Pearson hypothesis testing. It includes numerous examples
which are analytically solvable, but covers less on the numerical
solutions. It goes well beyond the scope of the course. It does not
cover R or other packages.
Sachs, Angewandte Statistik. Methodensammlung mit R
A very
detailed and mathematical introduction to statistics. It contains a
lot more than you'll need for the course but the level of mathematics
is not as high (or as offputting) as first appearances might
suggest. R is used to illustrate the statistics (rather than the other
way around, as is the case is some other books). With problems and
solutions. Available online via the University Library (you can
download the whole book as PDF). I've not used this book, but judging from (a) the Forward, (b) the lack of virtually
any reference to Bayesian statistics or
Richard Cox or Harold Jeffreys, this is an unashamedly frequentist
approach to statistics. You have been warned!
Toutenburg and Heumann, Deskriptive Statistik and Induktive
Statistik
This pair of books - in German - gives a detailed
introduction to statistics and R from a somewhat mathematical
perspective. It goes into more theory and depth than you'll need for
this course. Lots of examples and solutions. I've not used it myself.
Venables and Ripley, Modern Applied Statistics with S (MASS)
"S" is essentially just another name for R. This books provides a very
good introduction to R and its use for both basic and advanced data
analysis. However, it assumes the reader is already reasonably familar
with the techniques, so this is not a book which can be used alone to
learn basic statistics. It goes well beyond the course, covering also
topics such as GLIMs, neural networks and spatial statistics. The
accompanying R package "MASS" contains many functions which will be
used in the course.
Verzani, Using R for introductory statistics
Quite
R-oriented and rather (too) basic. It's essentially an R guide rather
than a statistics text. Available online via the University Library
as an e-book.
Evans, Dylan, Risk Intelligence
A study of how we (should) use simple probability theory in everyday life to help us assess risks and make decisions. Evans' thesis is that many people, regardless of intelligence, have poor risk intelligence, i.e. are not very good at assessing probability, risk, expected gains and losses. This is a very readable and insightful book.
Gigerenzer, Gerd, Reckoning with Risk
A look at how uncertainty and probability is represented and, more often, misrepresented in everyday life: in the media, in law, and especially in medicine. He guides you through interpreting probabilistic information, and how you can use this correctly to make informed decisions. He has some very interesting examples.
Kahneman, Daniel, Thinking, fast and slow
A collection of very interesting insights - and results of experiments and surveys - into how we think about probability and statistics. He looks as how people actually assess information and make decisions. One of the main theses is that our intuitive brain is rather poor (in particular, biased) at probabilistic assessments. Very readable, and much of it is convincing.
More or Less
This is an excellent BBC radio programme - also available as a podcast - on statistical issues in the media. To quote the BBC web site
Tim Harford investigates numbers in the news. Numbers are used in every area of public debate. But are they always reliable? Tim and the More or Less team try to make sense of the statistics which surround us. Strongly recommended. Some of the stories are also available in written form at the More or Less website.