Lecture Course, SS2007
Faculty of Physics and Astronomy, University of Heidelberg


Introduction to Machine Learning, Pattern Recognition and Statistical Data Modelling

Return to main page

Texts

The principal texts for the course are as follows:

The Elements of Statistical Learning, Hastie et al., 2001, Springer-Verlag, ISBN 0-387-95284-5
Modern Applied Statistics with S, Venables & Ripley, 2002 (4th ed.), Springer-Verlag, ISBN 0-387-95457-0 (abbreviated as MASS in the R scripts)
Neural Networks for Pattern Recognition, Bishop, Oxford Univ. Press, 1995, ISBN 0-19-853849-9

I will follow the treatment of Hastie et al. for many topics in the course. It's not cheap (it's a Springer hardback job selling new for about Euro 67) but it's pretty good on many topics. Venables and Ripley provides a good introduction to R in general and covers the use of R for statistics and data modelling in particular. (S is the base language on which R is based. R is the free version. S-PLUS is the commercial version.) Bishop covers several general topics on prediction with high-dimensional data in addition to neural networks. There are many other books on the market covering machine learning and pattern recognition, but be careful as many are not that good.

Articles

Lecture 1. Introduction

Analysis of medium resolution spectra by automated methods: application to M55 and omega Centauri.
P.G. Willemsen, M. Hilker, A. Kayser, C.A.L. Bailer-Jones
Astronomy & Astrophysics, 436, 379-390 (2005)
[ ADS] [PDF version]
Example application of machine learning to stellar parameter estimation (neural networks and bootstrapping)

Towards a library of synthetic galaxy spectra and preliminary results of the classification and parametrization of unresolved galaxies from Gaia.
P. Tsalmantza, M. Kontizas, C.A.L. Bailer-Jones, B. Rocca-Volmerange, R. Korakitis, E. Kontizas, E. Livanou, A. Dapergolas, I. Bellas-Velidis, A. Vallenari, M. Fioc
Monthly Notices of the Royal Astronomical Society, submitted (2007)

Lecture 2. Data Exploration

Automated Classification of Stellar Spectra. II:
Two-Dimensional Classification with Neural Networks and Principal Components Analysis.

C.A.L. Bailer-Jones, M. Irwin, T. von Hippel
Monthly Notices of the Royal Astronomical Society, 298, 361 (1998)
[abstract] [online publication] [PDF version]

Lecture 3. Linear Methods (part 1)

An introduction to variable and feature selection
I. Guyon, A. Elisseeff
Journal of Machine Learning Research, 3, 1157 (2003)
[PDF version]

Lecture 4. Linear Methods (part 2)

Risk analysis of the Space Shuttle: Pre-Challenger prediction of failure
S.R. Dalal, E.B. Fowlkes, B. Hoadley
Journal of the American Statistical Assoication, 84, 945-957 (1989)
[JSTOR]

Problems in extrapolation illustrated with Space Shuttle O-ring data
M. Lavine
Journal of the American Statistical Assoication, 86, 919-921 (1991)
[JSTOR]

Lecture 7. Neural networks, search and optimization

Modelling data: Analogies in neural networks, simulated annealing and genetic algorithms
D.M. Bailer-Jones, C.A.L. Bailer-Jones
in Model-based reasoning: Scientific discovery, technological innovation, values
L. Magnani et al. (eds.), Kluwer/Plenum, pp. 147-165 (2002)
[abstract] [PDF version]

Evolutionary design of photometric systems and its application to Gaia.
C.A.L. Bailer-Jones
Astronomy & Astrophysics, 419, 385-403 (2004)
[abstract] [PDF version]

Lecture 8. More nonlinear stuff

Mclust mixture modelling package

Lecture 9. Support Vector Machines

LIBSVM. A C++ library with an interface from R. Links to documentation here too
Various tutorials
A recommended tutorial on SVMs
Probability estimates for multi-class classificaiotn by pairwise coupling, Wu, Lin and Weng (2003) (NIPS paper). See this link for the full paper.

Lecture 10. Model combination and selection

A cartoon guide to AIC and BIC
AdaBoost
A short introduction to boosting

Lecture 12. The last lecture

David MacKay's web site, with links to his papers on Bayesian methods (in general and in neural networks, especally this one.

Coryn Bailer-Jones, calj at mpia.de
Last updated July 2007