Lecture Course, SS2007
Faculty of Physics and Astronomy, University of Heidelberg


Introduction to Machine Learning, Pattern Recognition and Statistical Data Modelling

Tuesdays, 15:15 to 17:00 (s.t.) (starting 17 April 2007)
Seminarraum of the Astronomisches Rechen Institut, Mönchhofstr. 12-14

Lecturer

Dr. Coryn Bailer-Jones (Email: calj AT mpia.de)

Contents

Overview and objective
Sprechstunden
Lecture schedule (and downloads)
Links and online tutorials
Bibliography


Overview and objective

How does automated character recognition work? What is an artifical neural network? How can we find out which lifestyle factors influence the chance of having a heart attack? How does a genetic algorithm optimize a mathematical model? Can we predict stock market prices?

Learning from data is an essential to every area of science. It has applications in many walks of life, including number plate recognition, automated manufacturing, astrophysical survey projects, weather or climate prediction and diagnosis of diseases. Over the years, numerous algorithms and techniques have been invented (or reinvented) to address these kinds of issues, and come under a wide variety of names such as "machine learning", "pattern recognition", "statistical learning", "statistical data modelling" and so forth. The objective of this course is to provide a broad overview of the various statistical and mathematical methods which are used for analysing data, for inferring underlying behaviour, for understanding phenomena and for making predictions.

We shall learn the fundamental principles of modelling, see how these are implemented in various techniques and examine the similarities and differences, advantages and disadvantages of various methods. While basic mathematical concepts will be covered, formal or abstract definitions and derivations will be avoided. The emphasis will rather be on the practical use of the techniques and for this purpose numerous example applications will be covered. The course will make use of the (freely available) statistical software package R and some instruction in its use will be provided. Participants are encouraged to install this package and to use it for trying out several of the machine learning methods covered in the course.

Techniques which will be covered include (provisional list):

as well as general concepts such as

This is an introductory course, so prior knowledge of or experience using machine learning methods is not required. Basic prerequisites for the course are first year mathematics, in particular calculus, linear algebra and statistics. The lectures will be in English. The course is suitable for mid-term or advanced undergraduates, graduates and postdocs, or anybody interested in learning about machine learning methods and how to use them. By the end of the course the participants should have the knowledge, confidence and tools to apply machine learning methods to their own data sets.


Sprechstunden

Course topics and related issues can be discussed either individually or in groups, in English or in German. To make an appointment, please send me an email, indicating the issues you would like to discuss. (Where am I?)


Lecture schedule

PDF and ODP files of the viewgraphs, as well as copies of the R scripts used, will be provided after each lecture.
Note that these do not constitute a full set of lecture notes (that's what the books and the lectures themselves are for!).

Course syllabus (PDF)

Date Topic Viewgraphs R scripts Notes
17 April Introduction and basic concepts [ODP] [PDF] R scripts
24 April Data exploration [ODP] [PDF] R scripts
1 May No lecture (Feiertag)
8 May Linear methods (part 1) [ODP] [PDF] R scripts
15 May Linear methods (part 2) [ODP] [PDF] R scripts challenger data
22 May No lecture
29 May Basis expansions [ODP] [PDF] R scripts
5 June No lecture
12 June Additive models and kernel functions [ODP] [PDF] R scripts
19 June Neural networks, search and optimization [ODP] [PDF]
26 June More nonlinear stuff [ODP] [PDF] R scripts
3 July Support vector machines [ODP] [PDF] R scripts
10 July Model selection and combination [ODP] [PDF] R scripts
17 July Unsupervised learning and clustering [ODP] [PDF] R scripts
24 July The final lecture [ODP] [PDF]

Coryn Bailer-Jones, calj at mpia.de
Last updated July 2007