My research projects
Stellar-parameter estimation from Gaia's low-resolution spectro-photometry

Menu

Start
Research
Publications
Statistics playground
Teaching
CV
Contact

Background
ESA's Gaia satellite has the primary goal of providing position and parallax measurements of unprecedented accuracy for 1 billion stars in the Milky Way. However, in order to enhance astrophysical interpretation, the Gaia satellite also takes low-resolution spectro-photometry (120 pixels over whole optical wavelength range) for all sources. An example of such a low-resolution BP/RP spectrum is shown in Fig. 1.

Figure 1. Example BP spectrum (cyan) and RP spectrum (orange) of a dwarf star with effective temperature of 8000K.

What is the problem?
I am responsible for the software package "GSP-Phot", which is part of the Gaia data reduction pipeline that is described here. GSP-Phot fits Gaia's low-resolution BP/RP spectra by synthetic stellar model atmospheres in order to estimate fundamental stellar parameters (effective temperature, surface gravity, metallicity, line-of-sight extinction).
Fitting 1 billion spectra in finite time, though low-resolution, is an enormous challenge and requires careful methodology and programming. It also gives rise to the constraint that GSP-Phot must not consume more than 2GFLOP per source (ca. 1sec per source on a standard computer), which is imposed on us by our data processing centre.

GSP-Phot algorithm design
Scientifically, we not only need to publish stellar parameter estimates, but also uncertainties. These, however, often are non-trivial, exhibiting asymmetries and nonlinear correlations. This is estimated by an MCMC algorithm. Unfortunately, we do not have sufficient computation time to let the MCMC converge for 1 billion sources. Therefore, the MCMC requires a very good initial guess. To this end, we employ a machine-learning algorithm called extremely randomised trees, which directly provides parameter estimates for an input BP/RP spectrum. This first estimate is then further refined by a gradient-descent algorithm, that maximises the likelihood of the BP/RP spectrum given the stellar parameters. The resulting best-fit parameters are then used to intitialise the MCMC. GSP-Phot therefore consists of three algorithms: extremely randomised trees, gradient descent, MCMC.

Results on simulated data
Despite the limitations that the analysis of 1 billion spectra imposes onto the analysis, the results obtained by GSP-Phot are highly competitive, e.g., to other high-resolution spectroscopic surveys.
The internal errors of GSP-Phot at 15th magnitude (there will be several tens of millions of stars at 15th magnitude or brighter in the final Gaia catalogue) are shown in Fig. 2.

Figure 2. Internal errors of GSP-Phot for FGKM dwarfs at apparent magnitude G=15.

Given these small internal errors, the external errors of GSP-Phot will be clearly dominated by the mismatch of synthetic and real spectra.
A careful calibration is required in order to preserve the excellent performance of GSP-Phot. Such a calibration is limited by the availability of suitable standard stars. Tests of our calibration method have shown that given a few hundred standard stars with well-known parameters covering all relevant parts of the parameter space, an almost perfect calibration is possible such that GSP-Phot external errors can be as small as the internal errors.

Results on real data
The results on simulated data demonstrate that the GSP-Phot algorithm works and produces good results. Real data, however, bring their own complications:

First and foremost, while in simulations the instrument is perfectly known, the real Gaia onboard spectrograph requires an instrument model to be estimated from the comparison between high-quality ground-based stellar SEDs and the observed BP/RP spectra for the same stars. Such an estimate is provided by Gaia DPAC Coordination Unit 5. While the resulting instrument model is very good, it is not perfect.
Real BP/RP spectra have very different noise properties than what was anticipated in the simulations. The main difference to our simulations is that there are strong long-range correlations between far-apart pixels in the BP/RP spectrum.

All in all, the results on real data are worse than on simulated data, as is to be expected. For FGKM stars, a comparison to literature values results in RMS differences of 450K for temperature, 0.6dex for metallicity, and 0.5dex for surface gravity. Of course, the literature values include errors of their own, which account for part of this. A detailed validation will be published together with the results in Gaia DR3 (first half of 2022).