Документ взят из кэша поисковой машины. Адрес оригинального документа : http://star.arm.ac.uk/PhD-Thesis/files/cwr/CWR_phd.pdf
Дата изменения: Tue Aug 31 12:33:13 2010
Дата индексирования: Tue Oct 2 08:19:18 2012
Кодировка:
Поисковые слова: molecular cloud

On the Automatic Analysis of Stellar Sp ectra
A thesis submitted for the degree of Do ctor of Philosophy

by

Christopher Winter, B.Eng.

Armagh Observatory Armagh, Northern Ireland & Faculty of Science and Agriculture Department of Pure and Applied Physics The Queen's University of Belfast Belfast, Northern Ireland

March 2006

"Quia non erit impossibile apud Deum omne verbum"

To Stacey

"Qui invenit mulierem invenit bonum et hauriet iucunditatem a Domino"

Acknowledgements
I would like to acknowledge and thank my sup ervisor, C.S. Jeffery, for his sound advice and direction over the course of this pro ject, and the staff and students of the Armagh Observatory for their helpful supp ort and assistance. I am very grateful to J.S. Drilling, E.M. Green, and A. Ahmad, all of whom supplied sp ectroscopic data that was used in this pro ject. In addition, my thanks go to C.A.L Bailer-Jones for the use of his neural network code, STATNET. This work was carried out as part of the CosmoGrid pro ject, funded under the Programme for Research in Third Level Institutions (PRTLI) administered by the Irish Higher Education Authority under the National Development Plan and with partial supp ort from the Europ ean Regional Development Fund. This work also uses data from the Sloan Digital Sky Survey (SDSS) data archive. Funding for the creation and distribution of the SDSS Archive has b een provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Aeronautics and Space Administration, the National Science Foundation, the U.S. Department of Energy, the Japanese Monbukagakusho, and the Max Planck Society. The SDSS Web site is http://www.sdss.org/. The SDSS is managed by the Astrophysical Research Consortium(ARC) for the Participating Institutions. The Participating Institutions are The University of Chicago, Fermilab, the Institute for Advanced Study, the Japan Participation Group, The Johns Hopkins University, the Korean Scientist Group, Los Alamos National Lab oratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.

Chris Winter March, 2006

iii

Abstract
This pro ject investigates the problem of automatically searching for and analysing astronomical sp ectra from large data sets. The three core problems of (1) sp ectral classification, (2) physical parameterisation, and (3) searching are examined, and a generalisable set of tools is established based on the techniques of artificial neural networks (ANNs), 2 minimisation, and principal comp onents analysis (PCA). These tools are then applied to the archives of the Sloan Digital Sky Survey (SDSS) to automatically search for and analyse the sp ectra of hot sub dwarf stars. Sp ectral classification is tackled by the versatile statistical machine learning method of ANNs. An ANN is trained to classify hot sub dwarf sp ectra onto the classification system defined by Drilling et al. (2006), obtaining global errors (rms ) of 2 subtyp es for sp ectral typ e, 1 sub class for luminosity class, and 4 sub classes for the helium class. These errors are in line with accuracies achieved by human classifiers. Physical parameters are obtained by fitting observations to grids of theoretical models using a 2 minimisation procedure. A new methodology has b een develop ed for managing and indexing large grids of theoretical models in the 2 minimisation code, SFIT. Concepts from the field of computational geometry are used to remove several limitations from this code, and pave the way for its use in a distributed parallel computing environment. Searching for the sp ectra of a particular typ e of ob ject in large, unknown data sets is accomplished using the multivariate statistical technique, PCA. The mechanics of this tool are outlined, and its use demonstrated by searching for hot sub dwarf sp ectra in the SDSS. This solution provides a means to reduce unknown data sets to quantities suitable for visual insp ection. 282 sp ectra of hot sub dwarf candidates are obtained from the SDSS and analysed. The results evidence several unexplained phenomena of extended horizontal branch stars, namely: 1) the existence of the second horizontal branch gap of Newell (1973); 2) two sdB nHe Teff sequences; and 3) a clustering of hot, helium rich stars at Teff 44, 000K , log g = 5.7. These findings p ose imp ortant questions for stellar evolution theory in the realms of the extended horizontal branch.

v

Contents
Acknowledgements Abstract List of Tables List of Figures 1 Intro duction 1.1 1.2 1.3 Astronomical Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . Large Data Sets And Their Sources . . . . . . . . . . . . . . . . . . . . . Astronomical Sp ectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 1.3.2 1.4 Typ es Of Ob jects And Their Sp ectra . . . . . . . . . . . . . . . . Automatic Methods of Analysis . . . . . . . . . . . . . . . . . . . iii v xii xvi 1 3 6 12 13 17 19 19 21 26 26 27 29 32

Hot Sub dwarf Stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 1.4.2 1.4.3 1.4.4 Sp ectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stellar Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . Why Study Them? . . . . . . . . . . . . . . . . . . . . . . . . . . Why Search For Them In The SDSS? . . . . . . . . . . . . . . .

1.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Classification - Artificial Neural Networks 2.1 Classifying Hot Sub dwarfs . . . . . . . . . . . . . . . . . . . . . . . . . . vii

viii 2.1.1 2.1.2 2.1.3 2.2

CONTENTS The Training Sample . . . . . . . . . . . . . . . . . . . . . . . . . Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 35 38 40 43 45 49 51 51 55 57 58 62 62 64 67 72 80 81 83 84 86 95

Physical Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 2.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.3

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Parameterisation - 2 Fitting 3.1 3.2 Analysing Stellar Sp ectra . . . . . . . . . . . . . . . . . . . . . . . . . . SFIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 3.2.2 3.3 Limitations of SFIT . . . . . . . . . . . . . . . . . . . . . . . . . Prop osal to Remove SFIT's Limitatons . . . . . . . . . . . . . .

Tetrahedralisation: Interp olation and Indexing . . . . . . . . . . . . . . 3.3.1 3.3.2 3.3.3 Simplex Interp olation . . . . . . . . . . . . . . . . . . . . . . . . Grid Index - Delaunay Triangulation . . . . . . . . . . . . . . . . Navigating the Index - Point Location . . . . . . . . . . . . . . .

3.4 3.5

Testing the Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Filtering - Principal Comp onents Analysis 4.1 Constructing A PCA-Based Filter . . . . . . . . . . . . . . . . . . . . . 4.1.1 4.1.2 4.2 4.3 Mathematics of PCA . . . . . . . . . . . . . . . . . . . . . . . . . Building A Hot Sub dwarf Filter ..................

Searching the SDSS for Hot Sub dwarfs . . . . . . . . . . . . . . . . . . .

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 107

5 Application I - SDSS Hot Sub dwarfs 5.1

Search Criteria And Data Sets . . . . . . . . . . . . . . . . . . . . . . . 107

CONTENTS 5.2 5.3 5.4

ix

PCA Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4.1 5.4.2 5.4.3 Parameterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Radial Velocities . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.5 5.6 5.7

Sources of Error

Analysis of PCA Filter Efficiency . . . . . . . . . . . . . . . . . . . . . . 123 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 131

6 Application I I - Other Data Sets 6.1 6.2 6.3 6.4

2MASS-Selected Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 SDSS sdB-He Stars of Harris et al. (2003) . . . . . . . . . . . . . . . . . 137 Ahmad & Jeffery (2003) He-sdBs . . . . . . . . . . . . . . . . . . . . . . 138 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 141 152

7 Conclusions And Future Work Bibliography

Appendices
A Results for 192 Drilling et al. (2006) Hot Sub dwarfs B Results for 282 SDSS DR3 Hot Sub dwarf Candidates C Results for 83 2MASS-Selected Hot Sub dwarf Candidates D The Armagh Observatory Cluster

161
163 175 189 193

D.1 Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 D.2 Software Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 On the Automatic Analysis of Stellar Sp ectra

x

CONTENTS D.3 MPICH 1.2.4 RPM Sp ec File . . . . . . . . . . . . . . . . . . . . . . . . 202

E LTE-CODES

207

E.1 Directory Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 E.2 Build System Organisation . . . . . . . . . . . . . . . . . . . . . . . . . 209 E.3 Installation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

List of Tables
2.1 Results of the leave-one-out procedure as applied to a committee of five 901:10:3 ANNs, for 150, 300, 500, 700 and 1000 training iterations. 2.2 2.3 2.4 As Table 2.1, but for the committee of five 901:5:5:3 ANNs. Results of parameterising the 60 calibration stars. .. 38 39 45

......

............

A comparison b etween ANNs and 2 minimisation for parameterising the 133 unparameterised stars. . . . . . . . . . . . . . . . . . . . . . . . 49 72

3.1 3.2

Details of the model grid used in the comparison . . . . . . . . . . . . . Initial parameters used for the Amoeba and Levenb erg-Marquardt optimisation routines. The step sizes used for Amoeba are also given . . . . Results of BD+10 2179 analysis with the unmodified version of SFIT . Results of BD+10 2179 analysis with the modified version of SFIT . . . The model grid used to obtain physical parameters of the set of test models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73 73 74

3.3 3.4 3.5

74

3.6

RMS comparison of parameterisation results from each interp olation method with the original parameters of each model. Also given is the RMS difference b etween the methods, and a comparison b etween the results in the region of parameter space for which b oth schemes seem to give their b est results (see Figures 3.6 and 3.7). ............. 79

5.1 5.2

Summary of data quantities obtained from the SDSS DR3. . . . . . . . 108 The model grid used to obtain physical parameters from the SDSS hot sub dwarf candidates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 xi

xii 6.1

LIST OF TABLES Parameters of the two calibration stars as obtained by 2 -fitting to NLTE (Green et al., 2006) and LTE (Armagh) model atmospheres. Formal errors are given in parentheses. . . . . . . . . . . . . . . . . . . . . . . . 133 6.2 6.3 Classification results for the sdB-He stars of Harris et al. (2003). . . . . 137 Classification results for the Ahmad & Jeffery (2003) He-sdBs. . . . . . 140

A.1 Parameterisation Results for 192 Drilling et al. (2006) Hot Sub dwarfs . 164 B.1 Results for 282 SDSS Hot Sub dwarf Candidates . . . . . . . . . . . . . . 176 C.1 Results for 83 2MASS-Selected Hot Sub dwarf Candidates . . . . . . . . 189

List of Figures
1.1 A stellar sp ectrum (top), and a galaxy sp ectrum (b ottom). (Taken from the SDSS) 1.2 .................................. 14

Example of a quasar (top) and carb on star (b ottom) sp ectrum. (Taken from the SDSS) ............................... ........... 15 16

1.3 1.4

The emission sp ectrum of the Orion nebula (M42).

Examples from each hot sub dwarf sp ectrographic subgroup. Classifications listed are those from Drilling et al. (2006). ............. 20

1.5

Schematic temp erature-luminosity diagrams showing: a) the p ositions of stars b elonging to the main stellar groups; b) the normal sequence of stellar evolution exp erienced by a star of a few solar masses; c) p ossible evolution of an sdB star in a binary system. (Diagram courtesy of C.S. Jeffery). ................................... 21

2.1

The training sample shows clustering in certain regions of the classification space. For clarity, p oints have b een offset by small random shifts in both coordinates. .............................. 34

2.2

Results of the leave-one-out procedure for b oth ANN architectures at the near-optimal training time of 300 iterations for the 901:10:3 architecture (left column), and 500 iterations for the 901:5:5:3 architecture (right column). Also plotted is the b est-fit linear least squares line. . . . . . . 41

2.3

Parameterisations of the 60 calibration stars. Results from each method have b een combined onto each plot. ANN results are indicated by blue crosses, and 2 minimiser results by red pluses. ............. 46

2.4

Parameterisations of the 133 unparameterised stars using the ANNs and 2 minimiser. Also shown is the b est-fit linear least squares line. . . . . xiii 48

xiv 3.1

LIST OF FIGURES Example of a k-D tree in two dimensions. On the left is the representation of how the k-D tree on the right splits up the x, y plane. (Adapted from Moore 1991.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2

A 1-simplex is a line segment. A 2-simplex is a triangle. A 3-simplex is a tetrahedron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In two dimensions, the Delaunay triangulation guarantees that no other points lie in the circumcircle of any simplex. ...............

61

3.3

65

3.4

The line segment, L, is constructed using the centroid of the starting tetrahedron, T , and the interp olation p oint, p. The tetrahedra visited on the walk-through are coloured grey. .................. 68

3.5

Parameterisation results from the linear interp olation in tables method. Clearly visible are anomalous results arising from a susp ected defect in the method's implementation. ....................... 76

3.6

Parameterisation results from the linear interp olation in tables method. Axes have b een restricted to give a view of the grid b oundaries describ ed in Table 3.5. ................................. 77

3.7

Parameterisation results from the simplex-based interpolation scheme. In contrast with Figures 3.5 and 3.6, the simplex-based scheme clearly restricts the optimisers to the grid b oundaries. . . . . . . . . . . . . . . 78

4.1

Principal comp onent analysis. u1 is the first principal comp onent and the axis onto which the pro jected p ositions of the data have their maximum sum. u2 is the second principal comp onent, and u1 · u2 = 0. ... 83 87 89 90

4.2 4.3 4.4 4.5

Mean sp ectrum of the Drilling et al. (2006) sample. . . . . . . . . . . . . First five PCs of the Drilling et al. (2006) sample. ............

Second five PCs of the Drilling et al. (2006) sample. . . . . . . . . . . . Cumulative variance of the first ten PCs of the Drilling et al. (2006) sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

4.6

Illustration of pro jecting hot sub dwarf sp ectra onto the first four PCs of the Drilling et al. (2006) standards. . . . . . . . . . . . . . . . . . . . . . 93 96 97

4.7 4.8

Histogram of reconstructions errors from the SDSS data sample. . . . . Sp ectra in first three reconstruction error histogram bins (R 3.0). .

LIST OF FIGURES 4.9 Sp ectra in first three reconstruction error histogram bins (R 3.0). .

xv 98

4.10 Sample of sp ectra from the eighth error bin (R 3.0). . . . . . . . . . . 100 4.11 Sample of sp ectra from the fourteenth error bin (R 4.5). . . . . . . . 101 4.12 Sample of high S/N DA white dwarfs from the 22nd - 24th error bins (R 6.4 - 7.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.13 Sample of sp ectra from the fifty-third error bin (R > 15.0). . . . . . . . 103 5.1 Histogram of reconstruction errors for the colour-colour selected SDSS sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 Parameterisation results of the 282 SDSS hot sub dwarf candidates. The helium main sequence of Paczynski (1971), and p ost-EHB evolutionary ґ tracks of Dorman et al. (1993) are also plotted. 5.3 . . . . . . . . . . . . . 112

Four example fits from the 282 SDSS hot sub dwarfs. The classification and physical parameters (Teff (K), log g, log(nHe /nH )) obtained for each star are printed in the lower corners of each plot. . . . . . . . . . . . . 113

5.4

The results of applying a kernel density estimate analysis to the data with another p ossible low-density region at Teff 41, 000K. . . . . . . . 114 from Figure 5.2. The low-density at Teff 22, 500K is prominent, along Classification results of the 282 SDSS hot sub dwarf candidates. Points have b een given small random offsets in each axis for clarity. . . . . . . 117

5.5

5.6

A comparison of the ANN classifications of the 282 SDSS hot sub dwarf candidates (left-most plots) with all the stars classified by Drilling et al. (2006) (right-most plots). Points have b een given small random offsets in each axis for clarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.7

A calibration of the ANN classifications onto the Drilling et al. (2006) system using the 282 SDSS hot sub dwarf candidates. . . . . . . . . . . 119

5.8

The distribution of SDSS-derived redshifts of the 282 hot sub dwarf candidates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.9

Examples of white dwarf and BHB contaminants. A - BHB star with deep Balmer lines. B - DA white dwarf with strong, broad Balmer lines due to high surface gravity. C - DB white dwarf. D - Uncertain (some evidence of weak carb on absorption, so p ossibly a DQ white dwarf ). . . 125 On the Automatic Analysis of Stellar Sp ectra

xvi

LIST OF FIGURES

5.10 This gray-shaded region of the log gTeff plane represents an area of good probability that the stars within it are sub dwarfs. . . . . . . . . . . . . 126

5.11 TP rates (red) and FP rates (blue) of the PCA filter as a function of the reconstruction error threshold, R. The green curve is the difference between the TP and FP rates. . . . . . . . . . . . . . . . . . . . . . . . 127

5.12 A closer examination of the TP and FP rates. The p eak in the green TP-FP curve occurs at R 7.0 and signifies the optimum value for R in the SDSS sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.1

SFIT physical parameters for 2MASS-selected sample. The helium main sequence of Paczynski (1971), and p ost-EHB evolutionary tracks of Dorґ man et al. (1993) are also plotted. . . . . . . . . . . . . . . . . . . . . . 134

6.2

ANN classification for 2MASS-selected sample. Points have b een given small random offsets in each axis for clarity. . . . . . . . . . . . . . . . 135 The stars assigned late-A and early-F sp ectral typ es by the neural network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.3

6.4

Comparison of ANN classifications with those of Drilling et al. (2006) for the 17 He-sdBs of Ahmad & Jeffery (2003). Points have b een given small random offsets in each axis for clarity. Also plotted is the b est fit least squares regression line with error bars showing the RMS of the residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.1

Schematic diagram showing how the work of this thesis fits in with the wider system envisaged by Jeffery (2003). . . . . . . . . . . . . . . . . . 149

Chapter 1

Introduction
The sp ectroscopy of light from astronomical ob jects is one of the most imp ortant methods for understanding the physics at work in the universe. Many fundamental parameters of those ob jects can b e determined by analysing their sp ectrum, including temp erature, chemical comp osition, motion, and other clues ab out their origin and evolution. Advances in information technology over the past 35 years, and their subsequent influence on observational methods, have allowed sp ectroscopic studies of unprecedented numb ers of ob jects to b e carried out over a short p eriod of time. Modern astronomy is now ab out dealing with very large quantities of data, and the problems associated with its management and analysis. This pro ject develops a collection of tools to assist astronomers in data mining large sets of astronomical sp ectra. The tools are general in nature, and can b e used to search for and automatically study the sp ectra of p otentially any typ e of astronomical ob ject. Together, the tools form a semi-automatic pip eline allowing a fast progression from large quantities of unknown sp ectra to useful scientific results. In the past, studies of automatic methods of sp ectral analysis have mainly centred around the problem of ob ject classification. This makes sense from the p oint of view 1

2

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

of a survey mission b ecause it is desirable to know what typ es of ob jects have b een observed, with particular interest b eing paid to those ob jects not falling into any known category. However, the individual astronomer, studying a particular typ e of ob ject, is not always interested in large-scale classification. He needs a way to search exclusively for samples in a data set which are most like his ob ject of interest. Once located, those samples are likely to exist in large enough numb ers to require further automatic assistance in their analysis. The techniques needed to help solve this problem already exist in the field, but they have not yet b een brought together and adapted to form any sort of useful, coherent system. As such, scientific insights contained in large data sets remain mostly untapp ed. The work in this pro ject represents what seems to b e the first attempt at rectifying this issue. Three ma jor algorithms are employed to construct a general data mining tool set.

1. Principal Comp onents Analysis is applied in a sup ervised classification role to create a filter that can help search for a sp ecific typ e of ob ject in an unknown data set. 2. Artificial Neural Networks have b een shown to b e a robust and versatile tool for many tasks in astronomy. They are used here to provide sp ectral classifications. 3. 2 minimisation is used to derive physical parameters for sp ectra by fitting them to grids of theoretical models.

Additional minor tools to facilitate data processing, management, and visualisation are also prototyp ed. Furthermore, a new and original methodology has b een developed to extend the functionality of the 2 minimisation code, SFIT, used at the Armagh Observatory.

1.1 Astronomical Data Mining

3

The code is modified using concepts from the field of computational geometry to allow the use of arbitrarily large, three-dimensional grids of theoretical models. This removes several severe limitations from the program, and prepares it for further modification to permit its use in a distributed computational environment. The sp ecific outcome of this pro ject is a set of general tools which can b e used to study the sp ectra of any astronomical ob ject, and a "real-world" demonstration of these tools through their application to search for and analyse the sp ectra of hot sub dwarf stars from the archives of the Sloan Digital Sky Survey. The results evidence several unexplained phenomena of extended horizontal branch stars that p ose imp ortant questions for the theory of stellar evolution. The work undertaken in this pro ject is a step towards the larger computational framework of Jeffery (2003) which outlines a wider system incorp orating the management of atomic data, dynamic generation and storage of grids of theoretical models, parameter space visualisation, and automated analysis. The use of distributed computational resources, such as the Grid, is also envisaged.

1.1

Astronomical Data Mining

The term "data mining" refers to the use of a broad set of techniques and algorithms for extracting useful patterns and models from very large data sets. Typically, the goal is to discover either something hitherto unknown ab out a phenomenon that only b ecomes apparent when it is studied en masse, or else a new phenomenon that only b ecomes apparent when observations are gathered in large enough quantities over a sufficiently wide range. Traditionally, in astronomy, much effort was invested in gathering observations of one particular ob ject, such as a star, in an attempt to understand that ob ject in detail. Given the universality of physics, the insights gained are usually applicable to other ob jects of the same typ e, allowing a wider understanding to b e achieved. On the Automatic Analysis of Stellar Sp ectra

4

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra However, advances in technology, such as large-area mosaic CCDs and multi-ob ject

fibre-fed sp ectrographs, mean that modern telescop es can b e made to gather observations of thousands of ob jects in a single night. This op ens up the p ossibility of discovering new facts ab out particular ob jects by studying their prop erties in large numb ers, and also the p ossibility of discovering completely new ob jects. Unfortunately, this abundance of data brings with it a set of new problems. Managing all of the information requires knowledge of data formats, storage mechanisms, and techniques for indexing, searching, and analysing it all. Indeed, modern astronomy is fast b ecoming a cross-disciplinary endeavour, providing a rich area for exploring many asp ects of computer science and statistics in the context of real-world applications.

Data Types

The nature of astronomical data means that it is inherently heterogeneous in b oth format and content, with observations now b eing gathered over all regions of the electromagnetic sp ectrum. Broadly sp eaking, astronomical data can b e classified into five domains.

· Imaging data are the fundamental comp onent of astronomical observations, capturing a two-dimensional picture of the universe within a narrow wavelength region at a particular p oint in time. · Catalogues of ob jects are constructed by analysing imaging data, and recording many different parameters ab out each ob ject such as brightness and colour, morphological information, and coordinates. · Sp ectroscopy provides detailed physical quantification of ob jects including temperature, chemical composition, and kinematical information. · Studies of ob jects in the time-domain provide valuable insight into the nature of the universe by identifying moving ob jects, variable sources (e.g., pulsating

1.1 Astronomical Data Mining stars), or transient ob jects such as sup ernovae and gamma-ray bursts.

5

· Finally, theoretical simulations of astronomical ob jects are an imp ortant source of data. Comparing theoretical models with observational data is the central mechanism in understanding how these ob jects formed and have evolved.

Each of these data domains carries its own particular problems to b e solved in a data management and mining context. Imaging data and catalogue construction require robust, automatic techniques to identify sources distinct from background-level noise, then differentiate b etween different typ es of ob jects (e.g., stars, galaxies, and comets), and finally the indexing of these data to allow fast searching based on spatial criteria. Sp ectroscopy and time-domain data require more involved algorithms for the automated reduction and calibration of observations algorithms which often have to be tailored for a specific instrument and telescope setup. The automatic analysis of sp ectroscopic data typically seeks to classify an ob ject onto a predefined categorical system by somehow comparing the ob ject with the set of standards which define the system. The physics of an ob ject which are manifest in its sp ectrum are determined by computing accurate theoretical models and comparing them with the observations. Any results then need to b e stored and indexed with the observations in a manner that allows for further re-analysis as more improved observations and theoretical models become available. Numerical simulations to generate theoretical models are always in need of p owerful and plentiful computational resources to allow more detail and precision to b e attained. As models will always have a shorter shelf-life than observations, appropriate meta-data needs to b e recorded and stored with the models so a historical record can b e kept as the underlying physics improves. This meta-data is also needed to help automate the parameterisation of observations by providing a means to explore grids of models, and ascertain when new models need to b e generated to cover a required part of the parameter space. On the Automatic Analysis of Stellar Sp ectra

6

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

1.2

Large Data Sets And Their Sources

Three main sources contribute to large observational data sets in astronomy, namely, those generated by sp ecific surveys, general-purp ose observatories, and space missions. In recent years, Virtual Observatory pro jects are investigating ways to combine the various databases generated by these sources, mapping out the computational infrastructures and tools needed to explore large data volumes.

Specific Surveys

Digital sky surveys generate very large quantities of homogeneous data over multiple wavelengths. As such, they are the main drivers b ehind the study of data mining methods in astronomy. The Digitized Palomar Observatory Sky Survey1 (DPOSS; Djorgovski et al., 1998) is a digital survey of the entire Northern sky in three visible-light bands, based on the photographic sky atlas, POSS-I I, the second Palomar Observatory Sky Survey (Reid et al., 1991). A set of three photographic plates (one in each filter), each covering 36 square degrees, were taken at each of 894 p ointings spaced by 5 degrees, covering the Northern sky. The plates were then digitised at the Space Telescop e Science Institute (STScI), producing ab out 1 gigabyte p er plate, and ab out 3 terabytes of data in total. Sp ecially develop ed data mining software called SKICAT (Weir et al., 1995) was used to p erform ob ject classification and measure around 40 parameters for each object, storing this information in a database which will eventually b e released to the community as the Palomar-Norris Sky Catalog. The Two Micron All-Sky Survey2 (2MASS; Skrutskie et al., 2006) is a nearinfrared (J , H , and KS ) all-sky survey. The pro ject is a collab oration b etween the
1 2

http://dp oss.caltech.edu/ http://www.ipac.caltech.edu/2mass/

1.2 Large Data Sets And Their Sources

7

University of Massachusetts which constructed the observatory facilities and op erated the survey, and the Infrared Processing and Analysis Center at Caltech which is resp onsible for all data processing and archive issues. The survey b egan in the spring of 1997, completing survey-quality op erations in 2000, with the final catalogue b eing released in March, 2003. The survey includes over 12 terabytes of imaging data, with the final catalogue containing over one million resolved galaxies, and more than three hundred million stars and other unresolved sources to a limiting magnitude of KS < 14.3. 2MASS is currently producing the following data products for the entire sky:

· A digital atlas of the sky comprising approximately 4 million 8ґв16ґ images, having ab out 4ґґ spatial resolution in each of the three wavelength bands, · A p oint source catalog containing accurate p ositions and fluxes for 300 million stars and other unresolved ob jects, · An extended source catalog containing p ositions and total magnitudes for more than one million galaxies and other nebulae.

The 2dF Galaxy Redshift Survey3 (2dFGRS; Colless et al., 2001) is a ma jor sp ectroscopic survey taking full advantage of the unique capabilities of the 2dF facility built by the Anglo-Australian Observatory4 . The 2dFGRS obtained sp ectra for 245,591 ob jects, mainly galaxies, brighter than a nominal extinction-corrected magnitude limit of bJ = 19.45. Reliable redshifts were obtained for 221,414 galaxies. The galaxies cover an area of approximately 1,500 square degrees selected from the extended APM Galaxy Survey of the South Galactic cap. The final release dataset comprises the following elements:

· source catalogues for the full survey, containing data for 382,323 ob jects, together
3 4

http://www.mso.anu.edu.au/2dFGRS/ http://www.aao.gov.au/2df/

On the Automatic Analysis of Stellar Sp ectra

8

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra with related material, · sp ectroscopic catalogues for 245,591 ob jects, containing the sp ectroscopic parameters such as redshifts and sp ectral typ es.

The Sloan Digital Sky Survey5 (SDSS; York et al., 2000) is a pro ject to survey a 10,000 square degree area (1/4 of the entire sky) of the North Galactic hemisphere over a 5 year p eriod. The estimated 100 million catalogued sources from this survey will then b e used as the foundation for the largest ever sp ectroscopic survey of galaxies, quasars and stars. A dedicated 2.5m telescop e is sp ecially designed to take wide field (3x3 degree) images using a 5в6 mosaic of 2048в2048 CCD's, in five wavelength bands, op erating in scanning mode. Sp ectroscopic targets are then observed using two sp ectrographs each with 320 fibres feeding in light from the focal plane. A total of four 2048в2048 CCDs (one for each channel of each sp ectrograph) collect the sp ectra. The total raw data will exceed 40 terabytes, and a processed subset of ab out 1 terabyte in size will consist of 1 million sp ectra, p ositions, and image parameters for over 100 million ob jects, plus a mini-image centered on each ob ject in every colour. The data will b e made available to the public at sp ecific milestone releases, and up on completion of the survey.

General-Purpose Observatories

Traditional ground-based observatories have b een saving data, primarily as backups for the users, for a significant time, accumulating large quantities of valuable, but heterogeneous, data. Unfortunately, lack of funding, and this inherent heterogeneity, makes it difficult to archive the data in such a way as to make it available and easy to access for the wider astronomical community. However, some notable exceptions do
5

http://www.sdss.org/

1.2 Large Data Sets And Their Sources exist.

9

The National Optical Astronomy Observatory6 (NOAO) is a US organisation that manages ground-based national astronomical observatories including the Kitt Peak National Observatory, Cerro Tololo Inter-American Observatory, and the National Solar Observatory. The NOAO has b een archiving all data from their telescop es in a program called "Save-the-Bits" which, prior to the introduction of survey-grade instrumentation, generated around half a terabyte and over 250,000 images a year. With the introduction of survey instruments and related programs, the rate of data accumulation has increased, and NOAO now manages over 10 terabytes of data. The Europ ean Southern Observatory7 (ESO) op erates a numb er of telescop es (including the four 8m class VLT) telescop es at two observatories in the southern hemisphere: the La Silla Observatory, and the Paranal observatory. As with many other ground-based observatories, ESO has b een archiving data for some time, with storage rates approaching a steady rate of approximately 20 terabytes of data p er year from all of their telescop es. This numb er will eventually increase to several hundred terabytes with the completion of the rest of the planned facilities, including the VST, a dedicated survey telescop e similar in nature to the telescope built for the SDSS pro ject.

Space Missions

Although ground-based observatories are aided by the advancement of technology and continue to make imp ortant discoveries, they will always b e encumb ered by the restrictions imp osed by the Earth's atmosphere. Thus, space missions, although extremely exp ensive, are critical comp onents in the study of the universe, and all of the data they produce are very valuable and therefore archived.
6 7

http://www.noao.edu/ http://www.eso.org/

On the Automatic Analysis of Stellar Sp ectra

10

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra The Multimission Archive at the Space Telescop e Science Institue8 (MAST)

archives a variety of astronomical data gathered from space missions, with the primary emphasis on the optical, ultraviolet, and near-infrared parts of the sp ectrum. MAST provides a cross correlation tool allowing users to search all archived data for all observations which contain sources from either archived or user-supplied catalogue data. In addition, MAST provides individual mission query capabilities. The dominant holding for MAST is the data archive from the Hubble Space Telescop e, but with total holdings currently exceeding ten terabytes, including (or providing links to) archival data for the following missions or pro jects: Hubble Data Archive, Galaxy Explorer, Far Ultraviolet Explorer, International Ultraviolet Explorer Final Archive, Extreme Ultraviolet Explorer, Hopkins Ultraviolet Telescop e Archive, Ultraviolet Imaging Telescop e Archive, Wisconsin Ultraviolet Photop olarimeter Exp eriment Archive, Cop ernicus UV Satellite Archive, Berkeley Extreme and Far-UV Sp ectrometer, The Interstellar Medium Absorption Profile Sp ectrograph, Digitized Sky Survey, The Rontgen SATellite Archive. Ё

Virtual Observatories

The Virtual Observatory (VO) concept represents a scientific and technological framework aimed at trying to manage the ongoing exp onential growth in the volume, quality, and complexity of astronomical data gathered by all of the sources discussed previously. Two main challenges are faced: 1. The effective inter-linking of large, geographically distributed data sets and digital sky archives in a homogeneous manner thereby allowing the optimal use of data mining algorithms to extract new science. 2. The research and development of data mining and "knowledge discovery in databases" (KDD) algorithms and techniques for the exploration and scientific
8

http://archive.stsci.edu/mast.html

1.2 Large Data Sets And Their Sources

11

investigation of large digital sky surveys, including combined multi-wavelength data sets.

These problems have significant relevance b eyond the field of astronomy as many asp ects of society are struggling with information overload. The National Virtual Observatory9 (NVO) is a pro ject funded by the US National Science Foundation to research and explore the technologies necessary to create a VO. The central themes of this research are the formation and adoption of standards to make the sharing of astronomical data easier. An NVO standard that has b een adopted worldwide in this regard is "VOTable", a way to represent a table of data in XML with good meta-data ab out the semantic meaning of the data. Grid computing is seen as an imp ortant resource for the large-scale analysis of astronomical data. The NVO have also produced research prototyp es demonstrating that interesting and efficient research can b e done by building up on on just a few new protocols and standards for data exchange and access. The AstroGrid10 pro ject is a UK government funded, op en source pro ject designed to create a working VO for UK and international astronomers. The goals of the Astrogrid pro ject are:

· A working datagrid for key UK databases · High throughput data mining facilities for interrogating those databases · A uniform archive query and data-mining software interface · The ability to browse simultaneously multiple datasets · A set of tools for integrated on-line analysis of extracted data · A set of tools for on-line database analysis and exploration
9 10

http://www.us-vo.org/ http://www.astrogrid.org/

On the Automatic Analysis of Stellar Sp ectra

12

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra · A facility for users to upload code to run their own algorithms on the data mining machines, · An exploration of techniques for op en-ended resource discovery Many of these goals are common to other nations and other disciplines, and the

AstroGrid pro ject is working closely with other VO pro jects worldwide through the International Virtual Observatory Alliance (IVOA) jointly formed with the NVO, and other world-wide VO efforts to deliver these goals.

1.3

Astronomical Sp ectra

It is clear that much work lies ahead if astronomers are to keep up with the ever increasing amounts of data their telescop es are able to gather. As such, the pro ject presented in this thesis focusses on one particular asp ect of the data mining problem: methods to analyse digitised astronomical sp ectra in an automated fashion. The central idea of data mining is to b e able to turn large quantities of unknown information into meaningful interpretations, and this is very much a non-trivial task in the context of astronomical sp ectra. Before large-scale statistics can b e done to search for patterns, the sp ectra of an interesting typ e of ob ject need to b e selected from a set of unknown data. Then, the ma jor analytical tasks are usually the classification and physical parameterisation of the sp ectra, after which pattern searching can b e performed. The problems of searching, classification, and physical parameterisation all involve some kind of pattern matching in and of themselves. Searching, which is basically a very coarse initial classification, matches unknown sp ectra to a set of known examples of a search target, retaining only those sp ectra which are within some acceptable distance from the set of examples. Classification assigns a fine-grained category to an ob ject based on how well it matches the sp ectral standards of the classification system used.

1.3 Astronomical Sp ectra

13

Physical parameterisation matches observations to grids of theoretical models in an attempt to find the b est fit and, consequently, estimates for the main physical quantities of interest

1.3.1

Types Of Objects And Their Spectra

All ob jects in the night sky can b e studied by sp ectroscopic analysis. Each ob ject has a set of distinct features which can b e found in its sp ectrum, reflecting the sp ecific physical processes at work in or around the ob ject. This section gives some examples of these ob jects and the sp ectra they produce. In Figure 1.1, the top plot shows the sp ectrum of a hot star. The overall shap e of a stellar sp ectrum approximates the curve of a black b ody at the same effective temp erature. This temp erature can b e estimated from the p eak wavelength (Wien's displacement law) or from the area under the sp ectrum (using the Stefan-Boltzmann law). The absorption lines in the sp ectrum reflect the various chemicals present in the star's atmosphere, and tell of the sp ecific physical conditions in that region of the star. The b ottom plot in Figure 1.1 is that of a galaxy sp ectrum. The overall sp ectrum of a galaxy is simply the combined sp ectrum of all the stars and other radiating matter in the galaxy. As galaxies differ in structure and relative comp osition of stellar typ e and gas, their sp ectra will also differ. Unlike stars, galaxies are not p oint sources, so their sp ectra must b e obtained differently. As a galaxy can often b e resolved as an extended ob ject, it is p ossible to take a sp ectrum of different parts of the galaxy, providing information ab out its comp osition, the stellar birth rates, and rotational velocity for that particular region. Quasars exhibit very bright emission features relative to a low intensity continuum in their sp ectra, as can b e seen in the top plot of Figure 1.2. In fact, it was only through careful analysis of the sp ectra of quasars that astronomers realised they were not just

On the Automatic Analysis of Stellar Sp ectra

14

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

Figure 1.1: A stellar sp ectrum (top), and a galaxy sp ectrum (b ottom). (Taken from the SDSS)

1.3 Astronomical Sp ectra

15

Figure 1.2: Example of a quasar (top) and carb on star (b ottom) sp ectrum. (Taken from the SDSS) On the Automatic Analysis of Stellar Sp ectra

16

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

Figure 1.3: The emission sp ectrum of the Orion nebula (M42). faint stars. The emission lines in quasar sp ectra are not where they are exp ected to b e seen if the ob ject was a nearby star. The standard explanation is that the quasar is at a vast distance and so app ears to b e receding from us due to the expansion of the Universe. This high recession velocity relative to the Earth causes the sp ectral lines to be redshifted to longer wavelengths. Exotic stars, such as Wolf-Rayet stars or the carb on star in the b ottom plot of Figure 1.2, are identified by the features present in their sp ectra. Carb on stars can have similar temp eratures to G, K, and M-class stars (4,600 - 3,100 K) but have a much higher abundance of carb on than normal stars which app ears in the sp ectrum as very strong molecular bands (C2 ). As these stars have such low temp eratures, they app ear red in colour, but the carb on molecules absorb light at blue wavelengths which makes the star app ear even redder. Carb on stars are assigned a typ e C sp ectral class. Emission nebulae are clouds of high temp erature gas. The atoms in the cloud are ionised by ultraviolet light from a nearby star and emit radiation as the electrons fall

1.3 Astronomical Sp ectra

17

back into atomic orbitals, so their sp ectra show strong emission lines, as can b e seen in Figure 1.3. These nebulae usually app ear to b e red b ecause the predominant emission line of hydrogen in the optical (H) happ ens to b e red. Although other colours are produced by other atoms, hydrogen is by far the most abundant. Emission nebulae are usually the sites of recent and ongoing star formation.

1.3.2

Automatic Methods of Analysis

Despite the diversity in features present in the sp ectra of astronomical ob jects, their general character always remains the same, namely, flux intensities measured across some wavelength range. This p ermits an automated method of analysis develop ed for one typ e of ob ject to b e applied, in principle, to the sp ectra of another. Over the years, a small numb er of automatic pattern matching techniques have found wide-spread use in the field. One of the first, and simplest, is the cross-correlation function. This is a signal processing technique wherein two signals are convolved according to the integral

c(z ) =
-

T (x)G(z - x)dx.

(1.1)

which convolves two functions, T (x) and G(x), over an infinite range, z = [-, ], yielding the resulting cross-correlation function, c(z ). Simkin (1974) demonstrated the use of the cross-correlation function for measuring the radial velocities of stars and galaxies. Tonry & Davis (1979) then applied the technique in a survey to measure galaxy redshifts. Kurtz (1982) used cross-correlation to classify low resolution (14 ° stellar sp ectra onto the MK classification system (Morgan A) et al., 1978). Cross-correlation remains an imp ortant, basic tool that is widely used,

On the Automatic Analysis of Stellar Sp ectra

18

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

mainly as a method for calculating radial velocities. Related to the cross-correlation function are minimum distance methods (MDM). Here, an observation is compared with a set of templates with the intention of finding a match which minimises some distance metric. Kurtz (1982), Lasala (1994), and Gulati et al. (1994a) used this technique to classify stellar sp ectra with very p ositive results. The application of minimum distance methods to the parameterisation of stellar sp ectra by fitting observations to grids of theoretical models is discussed in Chapter 3. Aritifical neural networks (ANNs) are a statistical pattern matching algorithm which have found wide application due to their p owerful ability to "learn" highly non-linear function mappings by studying examples of such mappings. von Hipp el et al. (1994) outline the use of ANNs for the classification of stellar sp ectra. Folkes et al. (1996) use ANNs to provide automatic classifications of low S/N galaxy sp ectra. Gulati et al. (1997a) show the use of ANNs in determining reddening estimates from low-disp ersion ultraviolet sp ectra of O and B stars. Weaver (2000a) demonstrates an ANN-based technique for p erforming two-dimensional classification of the comp onents of binary stars. Qin et al. (2003) use a form of ANN to p erform automatic star-galaxy separation by sp ectra with a high success rate. The use of ANNs to provide classifications and physical parameterisations of stellar sp ectra is studied in Chapter 2. Principal Comp onents Analysis (PCA) is a multivariate statistical technique which facilitiates the discovery of linear correlations b etween observed variables. Early work by Deeming (1964), Kurtz (1982), and Whitney (1983) examines the application of PCA to the unsup ervised classification of stellar sp ectra. Since then, PCA has found a wide application in sp ectral analysis such as creating classification systems for galaxy sp ectra (Sodre et al., 1998; Galaz & de Lapparent, 1998; Connolly & Szalay, 1999), determination of galactic redshifts (Glazebrook et al., 1998), and investigating the polarisation properties of broad absorption line quasars (Lamy & Hutsemґkers, 2004). e The application of PCA to stellar sp ectra is examined in more detail in Chapter 4.

1.4 Hot Sub dwarf Stars

19

1.4

Hot Sub dwarf Stars

The automatic analysis tool set established in this thesis, although general in nature, has b een applied to the analysis of a sp ecific typ e of astronomical ob ject in order to demonstrate the effectiveness of the tools, and how they might b e used in a real-world scenario. The early typ e subluminous dwarfs (Greenstein & Sargent, 1974) are defined as stars which p opulate a region located b elow the upp er main sequence on the HertzsprungRussell diagram, extending the horizontal branch to higher effective temp eratures, they are mostly considered to b e low-mass (Mcore 0.50 - 0.55M ), core helium burning ob jects surrounded by a thin envelop e of hydrogen. Visibly, they are quite blue ob jects, (B - V ) -0.3, (U - B ) -1.0, and have b een shown to dominate the p opulation of faint blue stars in the galaxy (mB 16) (Green et al., 1986). Regardless of their prior evolution, hot sub dwarfs are thought to b e direct progenitors of white dwarfs, although only a small fraction (< 2%) of white dwarfs are formed through this route.

1.4.1

Spectroscopy

The hot sub dwarfs fall into three broad subgroups based on spectroscopic criteria.

sdB Strong Stark-broadened hydrogen lines, with weak He I and no Mg I I absorption lines. sdOB/He-sdB Strong HeI absorption with weak or absent hydrogen Balmer lines, and HeI I. Carb on lines of varying strength. sdO Strong He I I and weak He I lines, with broad and shallow hydrogen Balmer lines sup erimp osed with He I I lines.

Examples from each of these subgroups can b e seen in Figure 1.4. On the Automatic Analysis of Stellar Sp ectra

20
3

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

HeII 2.5 PG1220-056 sdO3VII:He40

Flux (continuum = 1) + const.

2

FEIGE 110 sdO8VII:He6

1.5

PG1532+523 sdB1VII:He4

1

PG1544+488 sdBC1VII:He39

0.5 H CII H

HeI

CII CIII

H

0 4000

4200

4400

4600 Wavelength (Angstroms)

4800

5000

5200

Figure 1.4: Examples from each hot sub dwarf sp ectrographic subgroup. Classifications listed are those from Drilling et al. (2006).

Analyses of sdB sp ectra (e.g., Edelmann et al., 2003) show them to have effective temp eratures in the range 20, 000 Teff /K 40, 000, surface gravities in the range 5.0 log g(cgs) 6.0, and extremely helium-deficient atmospheres nHe /n burning ob jects, with a very thin hydrogen envelop e (Menv
H

0.01.

sdB stars are thought to b e low-mass (Mcore 0.50 - 0.55M , Caloi 1976), core helium 0.02M , Heb er 1986). The helium deficiency of sdB stars is b elieved to b e caused by gravitational settling, i.e., the settling of heavier elements sure to gravity (Wesemael et al., 1982). However, Heb er (1991) found that some sdB stars show metals like carb on and silicon to b e over-abundant in their atmospheres, b elieved to b e due to radiative levitation b eing large for those elements. Analyses of sdO sp ectra p erformed by Dreizler et al. (1990) and Thejll et al. (1994) find that they have effective temp eratures in the range 40, 000 Teff /K 80, 000, with the ma jority lying b etween 40, 000 - 50, 000K. Surface gravities lie in the range 4.0

1.4 Hot Sub dwarf Stars
Bright a) The Hertzsprung Russell Diagram Large High mass Horizontal Branch
Sequence Main

21
expansion slowed, envelope removed by companion subdwarf B stars

b)

Asymptotic Giant Branch Helium Burning

c)

Giants

Red

L

L

Giant Branch

L
binary star

White Dwarfs
Small Blue/Hot

Sun Hydrogen Burning Low mass Normal Stellar Evolution

Faint

T

Red/Cool

T

T

Figure 1.5: Schematic temp erature-luminosity diagrams showing: a) the p ositions of stars b elonging to the main stellar groups; b) the normal sequence of stellar evolution exp erienced by a star of a few solar masses; c) p ossible evolution of an sdB star in a binary system. (Diagram courtesy of C.S. Jeffery).

log g(cgs) 6.5, and the atmospheres of most sdO stars are helium-rich, n with additional enrichment of carb on and nitrogen.

He

0.50,

Drilling (1996) and Jeffery et al. (1997) represent the first attempts to introduce a homogeneous classification system for hot sub dwarfs. This past work has b een extended and further refined by Drilling et al. (2006) to produce a three-dimensional classification system based on a sp ectral typ e, luminosity class, and a helium class. The standard stars of this system are used in Chapter 2 as the basis for training an artificial neural network to automatically classify hot sub dwarf sp ectra.

1.4.2

Stel lar Evolution

One of the most useful tools in stellar astronomy is the Hertzsprung-Russell (HR) diagram which plots absolute magnitude against sp ectral typ e. The relationship b etween these two parameters shows several imp ortant patterns, with the most significant b eing that the ma jority of stars lie within a band stretching from the region of bright, hot stars to the region of dim, cool stars. This band is called the main sequence of the HR diagram. The giant stars are seen as a large cluster occuring ab ove the cooler end of the main sequence, and the white dwarfs p opulate a sequence of dim, hot stars running almost parallel to the main sequence. Evidently, the HR diagram serves as a kind of On the Automatic Analysis of Stellar Sp ectra

22

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

atlas for the different typ es of stars, and stellar evolution is usually describ ed in terms of how the underlying physics changes a star's p osition on the HR diagram over time. The HR diagram can also b e plotted as the relationship b etween colour and absolute magnitude, the version frequently used by observers. Theorists prefer to plot luminosity (or surface gravity) against effective temp erature, as shown in the schematic diagram of Figure 1.5a. The log g-Teff version of the HR diagram will b e used later in this thesis (Chapters 5 and 6). Canonical stellar evolution theory (see Figure 1.5b) predicts that a low-mass, core hydrogen burning main sequence star will eventually exhaust all the hydrogen in its core, converting it, through nuclear fusion, into helium. At the p oint when core hydrogen fusion ceases, the core is not hot enough to b egin helium fusion and starts to collapse b ecause no energy is b eing generated to counteract the effect of gravity. The collapsing core heats up, with some of this heat b eing transferred into the hydrogen envelop e surrounding the core. Eventually, this envelop e can become hot enough to fuse hydrogen in a thin shell at the core b oundary. The continued core collapse and hydrogen shell burning causes temp erature and pressure in the shell to increase. The increasing shell temp erature supplies sufficient pressure to the outer layers of the star, causing them to expand and cool. The star leaves the main sequence, and evolves to lower temp eratures at nearly constant luminosity, eventually reaching the red giant branch. Mass can b e lost from the outer layers due to stellar winds. The core collapse continues until the helium ceases to b ehave like an ideal gas, and becomes electron degenerate. Essentially, this means that the gas doesn't expand very much as its temp erature increases. The hydrogen burning shell adds helium to the core which continues to increase in temp erature. The core finally b ecomes hot enough to fuse helium and commences this reaction in an explosive manner called the helium flash.

1.4 Hot Sub dwarf Stars

23

The degeneracy of the core is removed, and it expands and cools as helium burning continues. The temp erature in the hydrogen envelop e also cools. The star contracts again as a new state of equilibrium is reached, and settles on the horizontal branch. A star on the horizontal branch has two energy sources: a helium burning core, and a hydrogen burning shell. The star evolves at nearly constant luminosity, with the core converting helium into mostly carb on and oxygen. When the helium is exhausted, the core again b egins to contract under gravity. Now, there is a hydrogen burning shell and a helium burning shell which cause the star to expand, evolving with increasing luminosity to the asymptotic giant branch. This stage of the star's life is characterised by high mass loss due to stellar winds. The process of helium fusion is very sensitive to temp erature, so the helium burning shell goes through a series of thermal pulses alternating with p eriods of quiescence. This is thought to enhance the efficiency of the stellar winds until the entire outer envelop e of the star is lost. When the mass of the envelop e is almost entirely depleted, the star b egins to evolve across the HR diagram at constant luminosity. A significant fraction of material has b een ejected from the outer regions of the star, and the exp elled gas is ionised by the star (temp eratures of such stars often exceed 50,000K). The planetary nebula disp erses into interstellar space. The hydrogen and helium burning layers eventually extinguish, and the star b ecomes a white dwarf with a degenerate carb onoxygen core. The core cools quickly and luminosity decreases, but it takes a long time for the thermal energy in the core to b e radiated away completely.

sdB Evolution

Extended horizontal branch stars tend to differ from true horizontal branch stars in terms of the luminosity of the hydrogen burning shell. As noted ab ove, the mass of this envelop e is very small (Menv 0.02M ) for a sub dwarf B star, meaning that its On the Automatic Analysis of Stellar Sp ectra

24

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

luminosity is negligible. For a normal horizontal branch star, the luminosity of its hydrogen envelop e equals or even exceeds that of the helium core. How the hot sub dwarfs come to arrive on the extended horizontal branch is still under debate. A numb er of scenarios have b een prop osed to explain the evolution of sdB stars. In the single star scenario, enhanced mass loss on the red giant branch due to stellar winds may remove all of the hydrogen-rich envelop e b efore core helium burning b egins (D'Cruz et al., 1996). In the binary scenario (see Figure 1.5c), Mengel et al. (1976) suggest sdB's could be formed from relatively wide binaries. Mass transfer through stable Roche Lob e overflow results in a depletion of the hydrogen-rich envelop e prior to the helium core flash. If the sdB progenitor and its compact companion are in a close binary system, a common-envelop e phase can result in the creation of a helium star. More recent work (Maxted et al., 2001) suggests 2/3 of sdBs are in close binary systems.

sdO Evolution

The atmospheric parameters of sdO stars show them to b e less homogenous than the sdBs. Generally, they app ear to fall into two subgroups on the log gTeff plane. One group ("compact" sdOs) lies close to the theoretical p ost-extended horizontal branch evolutionary tracks, and therefore might have evolved from sdB stars. The other group have lower surface gravities ("luminous" sdOs), lying closer to the p ost-asymptotic giant branch tracks. These stars are found in the same region on the log gTeff plane as the central stars of planetary nebulae. Various evolutionary scenarios have b een prop osed to explain the origin of sdO stars, and it is unlikely that a single scenario can come to explain b oth subgroups. Several theories exist for "compact" sdOs. The Post EHB scenario attempts to

1.4 Hot Sub dwarf Stars

25

explain the large numb er of sdOs found at the extreme end of the horizontal branch, along the helium burning main sequence, which suggests a close connection to sdB stars (Caloi, 1989; Dorman et al., 1993). But how does an sdB star b ecome an sdO? It has been suggested that the hydrogen-rich envelope of an sdB can re-ignite during the p ostextended horizontal branch phase, causing the star to evolve towards the asymptotic giant branch. However, the luminosity of the star is not sufficient to let it ascend the asymptotic giant branch, so the star returns to the sdO region. Dreizler et al. (1990) prop ose an alternate theory wherein deep mixing of the star's atmosphere by helium shell flashes could explain the helium enrichment seen in sdO stars. Other explanations for compact sdOs include the delayed helium flash scenario proposed by Sweigart (1997) which suggests that if mass loss during the red giant branch is too high, then the helium core never reaches ignition mass, and the star ends up as a helium white dwarf without going through a horizontal branch phase. Alternatively, if the ignition of helium is delayed but can still occur on the white dwarf cooling sequence, it will take the star into the region of the sdO stars. A third evolutionary scenario comes from binary white dwarf mergers as studied by Ib en (1990). It was found that the evolution of close binary systems, leading to the merger of He+He and CO+He white dwarfs, could produce low-mass helium burning stars similar to sdOs. Strong supp ort for this scenario comes from Napiwotzki et al. (2004) who found that almost all of the sdO stars in their sample were apparently single. For the "luminous" sdO stars, Heb er & Hunger (1987) suggest that they are "b orn again p ost-asymptotic giant branch" stars. In this scenario (Ib en et al., 1983), a p ostasymptotic giant branch star undergoes a late helium shell flash, sending it to the asymptotic giant branch for a second time. During this phase, the outer hydrogen envelop e can b e completely removed by stellar winds, leaving the star with the app earance of a luminous sdO star. Husfeld et al. (1989) suggest that a small numb er of sdO stars are also formed from normal p ost-asymptotic giant branch evolution. On the Automatic Analysis of Stellar Sp ectra

26

Chapter 1 - On the Automatic Analysis of Stellar Sp ectra

1.4.3

Why Study Them?

The study of hot sub dwarfs is imp ortant in several resp ects. As they exist in large numb ers, and have b een shown to b e highly evolved stars, they are useful indicators for studying the structure and evolution of the galaxy. Brown et al. (1997) suggest that these stars are the main cause of the ultraviolet upturn phenomenon (UV excess) seen in elliptical galaxies and the bulges of other spiral galaxies b ecause they sp end a long time (108 yrs) on the extended horizontal branch at high temp eratures. They are also considered to b e useful age indicators for elliptical galaxies (Brown et al., 2000). As describ ed previously, the hot sub dwarfs are interesting in their own right b ecause their evolution cannot seem to b e explained by canonical stellar evolution theories. This makes them imp ortant ob jects from an astrophysical p oint of view.

1.4.4

Why Search For Them In The SDSS?

The Sloan Digital Sky Survey pro ject and the data it produces is a prime example of where the future of astronomy is heading. The main observational goal of the survey is to collect photometric and sp ectroscopic data on galaxies and quasars. However, many quasars app ear as very blue ob jects, so the SDSS will observe sp ectra for a lot of blue stars, such as white dwarfs and hot sub dwarfs, b ecause these ob jects cannot b e differentiated at the photometric level. This makes the SDSS an unbiased, magnitude-limited survey containing p otentially hundreds of moderate resolution ( 3.0 ° , fully reduced hot sub dwarf sp ectra which A) can b e used to statistically identify new subgroups within an extracted sample. The large, homogeneous, publicly accessible data archives are therefore an excellent test site for the tool set develop ed in this thesis.

1.5 Summary

27

1.5

Summary

The continual advancement of observational and information technology is driving astronomy forward as a data-rich discipline. A clear need has b een identified for robust automatic methods to help analyse large databases of astronomical data, and extract from them useful science. This thesis focuses on automatic tools to search for and analyse astronomical sp ectra in large databases. Artificial neural networks (Chapter 2), 2 minimisation (Chapter 3), and principal comp onents analysis (Chapter 4) are the methods used to construct a generalisable tool kit for p erforming this task. The tools will b e demonstrated by applying them to the problem of searching for and analysing the sp ectra of hot sub dwarf stars from the archives of the Sloan Digital Sky Survey (Chapter 5). They will also b e used to analyse other smaller data sets (Chapter 6). As the amount of data gathered by astronomers increases, much work is needed to improve the ways in which it can b e analysed, and solve the problems that lie ahead. Some of the issues encountered during this pro ject are discussed, finally, in Chapter 7.

On the Automatic Analysis of Stellar Sp ectra

Chapter 2

Classification - Artificial Neural Networks
Artificial neural networks (ANNs) are a statistical machine learning algorithm b est thought of as arbitrary function estimators. They are able to provide a non-linear parameterised mapping b etween some input vector, x, and an output vector, y. For example, in the case of stellar sp ectral classification, x is the feature vector containing the flux values of a sp ectrum over some wavelength range, and y is the classification assigned to x according to some classification standard. The mapping p erformed by the ANN is analogous to the process which leads an expert human classifier to assign classification y to sp ectrum x. This ability to replicate non-linear functions makes ANNs a p owerful tool in astronomical data mining. In the context of machine learning, ANNs are part of a wider class of methods to approximate non-linear functions. Some of the mystery that commonly surrounds their use can b e disp elled by relating several imp ortant issues to the simpler process of polynomial curve fitting. Here, the problem is to fit a polynomial to a set of M points by minimising some error function. The nth -order p olynomial is given by

29

30

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra

n

y (x) = w0 + w1 x + ··· + wn xn =

wi xi .
i=0

(2.1)

If this is considered as a non-linear mapping which takes x as input and produces y as output, then the exact form of the function y (x) is determined by the values of the parameters w0 ,... wn , which are analogous to the weights in a neural network. The weights can b e determined by minimising an error function which compares the desired output from the p olynomial, d(xk ), for each input value, xk , and the p olynomial's actual output, y (xk ), for instance, the commonly used sum-of-squares error function,

E=

1 2

k

(y (xk ) - d(xk ))2 .

(2.2)

The minimisation of an error function such as Equation 2.2, which involves target values for the p olynomial outputs, is called supervised learning since for each input value the desired output is sp ecified. This is also a common way to determine the weights of a neural network for a particular application (the back-propagation algorithm adjusts the weights by calculating the derivatives of the error function with resp ect to the weights). A second form of learning, called unsupervised learning, does not involve the use of target data. In the context of neural networks, this form of learning can b e used to discover clusters or other patterns in a data set. If the p olynomial of Equation 2.1 is b eing trained to model a particular inputoutput mapping via sup ervised training, then the goal is to have a model which gives good predictions for new data, in other words one which exhibits good generalisation prop erties. One of the factors which influences a model's ability to generalise is the numb er of free parameters it has (i.e., the numb er of degrees of freedom). If a firstorder p olynomial is chosen to model a non-linear mapping, then it will generalise p oorly because a linear function is not flexibile enough to match the underlying mapping

31 function very well. In other words, the model has a high bias, meaning that the complexity of the p olynomial is not sufficient to model the actual mapping function. The bias can b e reduced by increasing the numb er of degrees of freedom, i.e., increasing the order of the p olynomial. This gives it greater flexibility to model the non-linear mapping. However, if the order is increased too much, the p olynomial's approximation to the underlying function will actually get worse - the mapping may give an exact fit to the training data, but its ability to generalise is hampered by highly oscillatory behaviour between training points. Such a model is said to over-fit the training data, and has a high variance meaning that the model is sensitive to the training data (i.e., quantity, noise, distribution, etc.). The p oint of b est generalisation is determined by a trade-off between the model's bias and variance, and occurs when the numb er of degrees of freedom in the model is relatively small compared to the size of the training data set. The quantity of training data is a significant factor in achieving good generalisation. As the quantity of training data increases, the model's complexity can b e increased, thereby reducing bias, while ensuring that the model is more heavily constrained, thereby also reducing variance. In the context of neural networks, the complexity of the model is determined by the numb er and structure of the internal weights. The weights are arranged in a network of layers, with more layers allowing the ANN to model essentially any non-linear function. However, as illustrated in the discussion of p olynomial fitting, a neural network with too much complexity may succeed in "memorising" the training data by over fitting them and therefore yielding p oor generalisation prop erties. A numb er of techniques exist to combat over-fitting and regularise, or smooth, the mapping produced by neural networks, such as weight decay which adds a p enalty term to the error function that weights against large values for the network's internal weights, early stopping of the training process also prevents the network weights from b ecoming too large, and adding noise to the training data set makes it more difficult for the neural network to over-fit. Amore detailed review of basic neural network theory can b e found in Bishop (1995). On the Automatic Analysis of Stellar Sp ectra

32

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra Previous work by others in the field of automatic stellar sp ectral analysis demon-

strates that ANNs are well-suited to fast classification and parameterisation of large quantities of sp ectra from across the main sequence. See, for example, Gulati et al. (1994b), von Hipp el et al. (1994), Storrie-Lombardi et al. (1994), Weaver & TorresDodgen (1995), Bailer-Jones (1996), Gulati et al. (1996), Bailer-Jones (1997), Gulati et al. (1997b), Weaver & Torres-Dodgen (1997), Bailer-Jones et al. (1997), Bailer-Jones et al. (1998), Singh et al. (1998), Rhee et al. (1999), AllendePrieto et al. (2000), Weaver (2000b), and Snider et al. (2001). Here, the feedforward multilayer back-propagation ANN code STATNET of BailerJones (1996) is used to obtain classifications and acquire astrophysical parameters from a sample of hot sub dwarf sp ectra. The values for the astrophysical parameters are compared with those obtained from a different computerised technique, that of minimisation as implemented in the code SFIT (see Chapter 3).
2

2.1

Classifying Hot Sub dwarfs

The hot sub dwarfs do not fall within the scop e of the standard MK system (Morgan et al., 1978), therefore Drilling et al. (2006) have extended and refined the earlier work of Drilling (1996) and Jeffery et al. (1997) to construct a three-dimensional MK-like classification scale for hot sub dwarfs. This scale is based upon a sample of spectra from a numb er of sources, covering the wavelength region 40504900° at a resolution A of 2.5° and consists of a `sp ectral' class, `luminosity' class, and a `helium' class. A, The classification scale uses a sp ectral typ e running from sdO1 to sdA (1 20), analogous to MK sp ectral classes. It introduces a helium class (0 40) based on H, HeI and HeI I line strengths, and uses luminosity classes IV VI I I, where most sub dwarfs have luminosity class VI I. The mapping b etween the Drilling et al. (2006) classes and those used elsewhere, e.g. the PG survey (Green et al., 1986), is illustrated in figure 16 of Drilling et al. (2006).

2.1 Classifying Hot Sub dwarfs

33

2.1.1

The Training Sample

A set of sub dwarf sp ectra was taken from a collection compiled by Drilling et al. (2006) from data provided by Moehler et al. (1990a,b), Dreizler et al. (1990), and Theissen et al. (1993). It comprises a representative sample of 174 PG sub dwarfs and blue horizontal branch stars, plus a few other stars not included in the PG catalog. Several observations have b een supplied for many of the targets with the sample containing 471 sp ectra in total at an approximate resolution of 2.5 ° A. The sp ectra are not homogeneous. Due to the data b eing gathered by different observers using different equipment at different locations, etc., a numb er of issues affect the sample including: calibration anomalies, velocity shifting, different windows of wavelength coverage, inconsistent S/Ns and disp ersion intervals, and so on. A pre-processing step was needed to correct these problems and establish a more homogenous sample. The sp ectra were visually insp ected to select the b est samples for each star. The resulting 359 sp ectra were corrected for large cosmic spikes and instrumental end-effects. A velocity shift correction was applied by cross-correlating each sp ectrum with a grid of theoretical sp ectra chosen to coarsely cover the approximate Teff , log g, log(nHe /nH ) range of the Drilling et al. (2006) classification scale. Finally, the sp ectra were rebinned onto a common wavelength grid of 4050 4950 ° with a A disp ersion of 1 ° pixel-1 . A It should b e noted that the radial velocity correction describ ed ab ove already partly solves the parameterisation problem by choosing the b est fitting model from the grid. As such, training the neural network to solve for this parameter simultaneously alongside the other astrophysical parameters may b e a more convenient approach. However, this was not attempted here.

On the Automatic Analysis of Stellar Sp ectra

34

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra

0 I II Luminosity Class III IV V VI VII VIII IX O O5 B Spectral Type B5 A

40

30 Helium Class

20

10

0 O O5 B Spectral Type B5 A

40

30 Helium Class

20

10

0 IX VIII VII VI V IV III Luminosity Class II I 0

Figure 2.1: The training sample shows clustering in certain regions of the classification space. For clarity, p oints have b een offset by small random shifts in b oth coordinates.

2.1 Classifying Hot Sub dwarfs

35

2.1.2

Methodology

As describ ed at the b eginning of the chapter, training an ANN to learn the Drilling et al. (2006) classification system involves iterating over the training set and minimising the sum-of-squares error function b etween the desired output and the network's actual output (see Equation 2.2) with resp ect to the ANN's internal parameters. The minimisation process continues until some criterion of convergence has b een reached (e.g., when the weight up dates have b ecome very small). A typical strategy to assess network p erformance after training is to apply the network to an application set for which the "true" classifications are known. Unfortunately, no other suitable set of sp ectra previously classified onto the Drilling et al. (2006) scale were available for the study presented here. An alternative is to split the Drilling sample into two similarly sized sets, with one used for training, and the other to quantify p erformance. However, as the Drilling sample is small, and its distribution across the parameter space is limited (see Figure 2.1), a concern is that there may not b e enough data to constrain the model if the sample is split into two smaller subsets. On the other hand, if the two subset approach is changed slightly, there is a way to determine how well a given ANN model p erforms using only the data in the Drilling sample. A technique called N -fold cross-validation, or the leave-one-out method, p ermits the greatest numb er of samples to b e used in training while still giving an idea of ANN p erformance over the whole sample set. The method proceeds by assuming a data set of size N . Each datum is left out in turn and the ANN is trained on the remaining N - 1 samples. The ANN's p erformance is then assessed by classifying the omitted datum. No random sampling is involved with this method, so rep eating the procedure for a particular ANN model always gives the same result.

On the Automatic Analysis of Stellar Sp ectra

36

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra The leave-one-out method carries with it a large computational cost as each ANN

model must b e trained N times (in this case, N = 359). As several models are to be tested, the computational burden was alleviated by the construction of a small distributed cluster of 15 ordinary desktop workstations at Armagh Observatory using the Condor batch system (e.g., Livny & Raman, 1998). The cluster reduced the computation time for the leave-one-out procedure by a factor of 10 compared to using only a single workstation. To determine the optimal complexity of the ANN model, two different ANN architectures were studied, one with a single hidden layer of 10 no des, and one with two hidden layers of 5 nodes each. The notation used to refer to these architectures are 901:10:3,and 901:5:5:3, resp ectively. This notation explains the structure of the neural network in terms of layers of processing nodes, and the numb er of nodes in each layer. For each network b eing tested, an input layer of 901 nodes corresp onds to the 901 flux points in the preprocessed observational sample, and an output layer of three nodes corresp onds to each parameter in the classification scale: sp ectral typ e, luminosity class, and helium class. For each model architecture, a committee of five ANNs was formed. The committee approach (see Bishop, 1995, sect. 9.6) trains a numb er of ANNs on the same data, and applies them in unison on a new datum. The results from each of the ANNs are then averaged together to provide a combined result. In STATNET, each network in the committee is initialised with different random values for the weights, so the committee approach seeks to achieve more robust results by averaging out `convergence noise' due to the variance of the model causing the minimisation process to get caught in local minima, with the final set of weights therefore b eing different for each committee ANN. The leave-one-out method was carried out for five different training ep ochs for each architecture: 150, 300, 500, 700, and 1000 iterations of the optimisation procedure. This required ab out four days of continuous computation on the Condor cluster. The

2.1 Classifying Hot Sub dwarfs

37

approach of stopping the training procedure early is a method of regularising the ANN models. STATNET also implements a weight decay factor in the neural network's sum-ofsquares error function, but this feature was not used here (or in the parameterisation network describ ed in the next section). Weight decay attempts to prevent the ANN model from over-fitting the training data by discriminating against network weights that b ecome too large during training. Large network weights (which can occur if the network is trained for too long) increase the complexity of the mapping b ecause they produce regions of high curvature in the input-output parameter space. As the classification training set was small, it was felt that weight decay should not b e used here in order to preserve the structure and curvature of the input-output mapping. An alternative is early stopping which regularises the network by limiting the effective numb er of degrees of freedom. This numb er is supposed to start out small and then grow during the minimisation of the sum-of-squares error function, which corresp onds to a steady increase in the complexity of the model. If the network error is measured against a validation sample, as is done here via the leave-one-out method, it is typically observed that this error often shows a decrease at first, followed by an increase as the network starts to over-fit. The network's training procedure can b e terminated close to the p oint of smallest error since this gives a network which is exp ected to have the b est generalisation p erformance. STATNET p osesses the capability to add weighting factors to each of the network's outputs so that certain outputs contribute more to the sum-of-squares error minimisation than others. These weighting factors (called ` ' parameters in STATNET) allow the user to control the level of modelling precision for each output variable. If this is limited by the noise in the data, 1/ should b e approximately equal to the standard deviation of the noise in the output variable. STATNET includes a data scaling option which separately scales each input and

On the Automatic Analysis of Stellar Sp ectra

38

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra

output variable to have zero mean and unit standard devition. With resp ect to the parameters, the variance scaling casts each in terms of the scaled variables. Therefore, 1/ roughly interprets as the fractional uncertainty in a particular output variable. As an example, the default value of 6.0 corresp onds to a standard deviation of 0.4. Thus, if the data are variance scaled and roughly normally distributed, 95% of the data will lie in the range -2 to +2, so this standard deviation corresp onds to approximately a 10% uncertainty. In terms of the Drilling et al. (2006) classification parameters, the exp ected accuracy in each parameter for a human classifier is ±2 sp ectral typ es, ±1 luminosity class, and ±2 helium classes. These corresp ond to uncertainties of 10%, 12.5%, and 5% resp ectively. Therefore, the STATNET parameters were set to 6.0 for the sp ectral typ e output, 4.0 for luminosity class, and 25 for helium class.

2.1.3

Results

Tables 2.1 and 2.2 give the

rms

and correlation coefficient values, r , comparing each

ANN architecture's results with the classifications assigned by Drilling et al. (2006) as determined by the leave-one-out method. 901:10: 300 2.1967 1.1835 4.5434 0.8621 0.8201 0.9483 3 500 2.2338 1.2199 4.3255 0.8586 0.8123 0.9533 700 2.2947 1.2435 4.3540 0.8523 0.8061 0.9527 1000 2.3434 1.2627 4.5109 0.8473 0.8012 0.9491

rms

r

SpT LC HeC SpT LC HeC

150 2.1041 1.1771 5.5604 0.8710 0.8209 0.9216

Table 2.1: Results of the leave-one-out procedure as applied to a committee of five 901:10:3 ANNs, for 150, 300, 500, 700 and 1000 training iterations.

The large

rms

values for helium scale classifications, apparent in b oth tables, suggest

the ANNs are having difficulty generalising for this parameter. However, the high

2.1 Classifying Hot Sub dwarfs 901:5: 300 1.8296 1.0766 5.2257 0.8983 0.8621 0.9316 5:3 500 1.9593 1.1078 4.3019 0.8858 0.8389 0.9528 700 2.0626 1.1536 4.1405 0.8759 0.8272 0.9573 1000 2.2202 1.2156 4.2633 0.8599 0.8116 0.9547

39

rms

r

SpT LC HeC SpT LC HeC

150 1.7446 1.0574 6.2962 0.9065 0.8507 0.9007

Table 2.2: As Table 2.1, but for the committee of five 901:5:5:3 ANNs.

correlation coefficients suggest a good learning resp onse. There are several p ossible reasons for this. Firstly, it could b e due to a problem with the neural network model itself, either a regularisation issue (e.g., not using weight decay), or sub-optimal settings of the parameters. Secondly, it is p ossible that the neural networks simply cannot do any b etter for this parameter, in which case the attention turns to the Drilling et al. (2006) classification scale itself and the observational sample on which this study is based. If the S/N of the observational sample is not sufficiently high enough for the ANNs to generalise well for the helium scale, this could b e affecting the bias and variance of the models, making it difficult to ascertain the underlying mapping function. It is also possible that the helium scale itself is too fine-grained. If the helium scale was scaled down by a factor of 4, a corresp onding four-fold reduction in the
rms

errors would

be observed (the corresponding correlation coefficients would remain unchanged as this statistic is not affected by scaling effects). This would bring them in line with those of sp ectral typ e and luminosity class, i.e.,
rms

±1 helium class.

Further investigation of this issue is required. It can b e seen in Tables 2.1 and 2.2 that b oth architectures are able to learn the appropriate sp ectral features associated with sp ectral type and luminosity class within the first 250-300 ep ochs of the training procecdure. After this p oint, further training only serves to degrade p erformance with resp ect to these parameters which indicates that the models are starting to over-fit the training data. On the Automatic Analysis of Stellar Sp ectra

40

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra For the helium scale, b oth architectures yield optimal classifications after a few

hundred more training ep ochs. The 901:10:3 architecture achieved b est p erformance at around 500 iterations, and the 901:5:5:3 architecture reached its optimum at around 700 iterations. A similar phenomenon was rep orted by Snider et al. (2001, sect. 5.1), although Willemsen et al. (2005) did not observe the same effect. The optimal trade-off in accuracy b etween the classification parameters occurs at around 300 training ep ochs for the 901:10:3 architecture, and 500 ep ochs for the 901:5:5:3 architecture. The results of these two ANNs are compared with the actual Drilling et al. (2006) classifications in Figure 2.2.

2.2

Physical Parameters

The ability of neural network models to obtain astrophysicalparameters of hot sub dwarf sp ectra was tested by generating a grid of synthetic sp ectra to b e used as a training set, and extracting two application sets from the Drilling et al. (2006) sample. The first application set contains 60 stars which were used by Drilling et al. to calibrate their classification system against the physical parameters of Teff , log g, and log(nHe /nH ). These 60 stars have b een previously analysed by their original observers, with astrophyscial parameters b eing derived mostly by the method of fine analysis. The second application set contains 133 stars from the Drilling et al. sample for which no astrophysical parameters have b een listed in Drilling et al. (2006). Using the first application set, the neural network results for those stars can b e compared against the results of the fine analyses p erformed by the original observers. However, the second application set has no measure of comparison. For that, the
2

fitting code used at Armagh Observatory, SFIT, is used to derive a set of astrophysical parameters based on a grid of synthetic sp ectra. SFIT is also applied to the first application set to serve as second comparison for the neural network results.

2.2 Physical Parameters

41

Architecture 901:10:3 A ANN Spectral Type B5 B O5 O O 0 I ANN Luminosity Class III IV V VI VII VIII IX IX VIII VII VI V IV III II Drilling Luminosity Class I 0 ANN Luminosity Class II O5 B B5 A 0 I II III IV V VI VII VIII IX IX VIII Drilling Spectral Type ANN Spectral Type A B5 B O5 O O O5

Architecture 901:5:5:3

B

B5

A

Drilling Spectral Type

VII VI V IV III II Drilling Luminosity Class

I

0

40

40

ANN Helium Class

20

ANN Helium Class 0 10 20 30 Drilling Helium Class 40

30

30

20

10

10

0

0 0 10 20 30 Drilling Helium Class 40

Figure 2.2: Results of the near-optimal training time and 500 iterations for the best-fit linear least squares

leave-one-out procedure for b oth ANN architectures at the of 300 iterations for the 901:10:3 architecture (left column), 901:5:5:3 architecture (right column). Also plotted is the line.

On the Automatic Analysis of Stellar Sp ectra

42

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra The neural network training grid contains 2009 synthetic sp ectra generated using

the line-blanketed LTE sp ectral synthesis code SPECTRUM (Jeffery et al., 2001). The grid covered the parameter space in Teff : 12000 - 50000K, Teff 5000K; log g: 3.5 - 6.0 dex, log g = 0.5 dex; and log(nHe /nH ): -3 - 3 dex, in 10 non-uniformly spaced intervals. In order to match this training grid to the Drilling et al. (2006) observations, each synthetic sp ectrum was first convolved with a Gaussian to lower its resolution to that of the observations ( 2.5 ° , and then re-binned onto the same wavelength grid as the A) observations (40504950 ° at a 1.0 ° disp ersion). A A Design limitations in the 2 minimisation code used at Armagh Observatory, SFIT (which are dealt with in Chapter 3), required a smaller grid of synthetic sp ectra to be used. The grid covered the parameter space: Teff = {15, 20, 25, 30, 35, 40, 50}kK , log g = {3, 4, 5, 6}, and log(nHe /nH ) = {-3, -1, 0, +1, +3}. This grid is commensurate with the disp ersion and S/N present in the Drilling et al. (2006) sample. A default instrumental profile of 1 ° (FWHM) was assumed during the fitting for A each application set, and all data p oints more than 5% ab ove continuum were rejected. All three intrinsic parameters, Teff , log g, and log(nHe /nH ), were free to vary in the optimisation. Solutions for vrad and v sin i were also obtained. The correction for vrad used during the pre-processing stage (Section 2.1.1) app eared to have left residual shifts of a few km s
-1 2

, and, in one case, (p ossibly where Balmer lines were confused with He ii lines)
-1

of a couple of ° gstrЁms. Overall, < vrad >= -1.9 ± 22.3kms An o satisfactorily close to the exp ectation value (0 km s
-1

, the mean b eing

). The solution for v sin i allowed

SFIT to b e tolerant of b oth the varying instrumental resolution present in the data, and any rotational broadening present in the source. Formally, < v sin i >= 59 ± 39kms
-1

.

A single normalisation procedure was applied to remove small trends in the background continuum. Nine "continuum" regions free of hydrogen and helium lines were

2.2 Physical Parameters

43

defined. After an initial optimisation step, the sp ectrum was divided by the initial fit. A second-order p olynomial was fitted to this ratio using only the data in the continuum regions. An estimate of the true sample S/N was obtained from the RMS of the ratio around the p olynomial fit in these same regions. The sample was then multiplied by the p olynomial fit b efore a second optimisation step was applied.

2.2.1

Methodology

A control exp eriment was carried out to determine if neural network models trained on a set of synthetic sp ectra at infinte S/N are able to accurately parameterise other synthetic sp ectra over a range of S/Ns. The training grid of synthetic sp ectra was randomly divided into two evenly sized training and application subsets. Several committees of different ANN architectures were trained on the training subset for range of training ep ochs. The intention here was to establish optimal model complexity for the task without using weight decay. The STATNET parameters for each of the network output variables (Teff , log g, log(nHe /nH )) were set to 6.0, estimating a 10% error in each parameter. This is commensurate with the spacing of the grid p oints over the parameter space, and assumes, conservatively, that the neural network model will do at least as well as nearest neighbour matching to the synthetic spectra in the grid. Again, the Condor cluster allowed the different exp eriments to b e carried out in parallel. The application subset was duplicated eight times. Each set was degraded to one of the following S/Ns by the addition of Gaussian noise: {, 1000, 500, 100, 50, 20, 10, 5}. Each trained ANN committee was applied in turn to the noised application sets. The exp eriments suggested that the optimal network architecture was a 901:10:10:3 configuration, trained for 500 ep ochs for Teff and log g parameterisations, and 1350 ep ochs for log(nHe /nH ) parameterisations. The results showed p ositive correlations b etween the actual parameters and the On the Automatic Analysis of Stellar Sp ectra

44

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra

ANN's results. However, the accuracy of the ANNs declined quickly as the S/N of the application set fell b elow 100. This observation is imp ortant b ecause the sp ectra in the Drilling et al. (2006) sample are not of a consistent S/N. The ma jority of the sample has an S/N somewhere in the 50 100 range, so the neural network model should account for this. The results imply that an ANN trained on synthetic sp ectra of infinite S/N will not give the most accurate parameterisations of the observational sample. This result was also rep orted by Snider et al. (2001) and Willemsen et al. (2005). In the latter, Willemsen et al. rep orted on their attempts to improve the generalisation abilities of their neural network models by increasing the amount of weight decay taking place. They found that p erformance improved only when the weight decay term was chosen to b e rather large, indicating that the problem lies in regularising the model, i.e., a neural network trained on high S/N sp ectra will over-fit the data unless "restrained". The solution chosen in the study presented here was to make two copies of the entire grid of 2009 theoretical models. One copy b eing degraded to a S/N of 100, the other to 50. The final training set for the optimal network architecture was then a combination of all three grids, totalling 6027 synthetic spectra. This addition of noise to the training grid serves as another mechanism of regularisation. Willemsen et al. (2005) employed a similar solution. The noise serves to `smear out' each training p oint, making it difficult for the network to fit individual p oints precisely, and hence reducing over-fitting. Despite increasing the size of the training set, there is no reason to b elieve that the optimal ANN configuration would b e consequently changed. The fundamental structure and physical parameters of the noised sp ectra are no different than the unnoised sp ectra.

2.2 Physical Parameters

45

2.2.2

Results

Application Set 1: 60 Calibration Stars

The results of applying the two ANN models to the 60 calibration stars are given in the first column of Table 2.3, and the actual parameters obtained are listed in App endix A. The correlation coefficients show a reasonable agreement b etween the ANN's predicted Teff parameterisations and those of Drilling et al. (2006). However, the log(nHe /nH ) and log g correlation coefficients are not quite as p ositive. Looking at the middle and last plots in Figure 2.3, it can b e seen that the ANN's results in these parameters (indicated by the blue crosses) are visibly more scattered than the Teff results given in the first plot. The typical errors quoted in the original fine analyses of these stars are T = eff ±2500K and
log g

= ±0.2dex. The results in the first column of Table 2.3 are still

within 2 of the fine analysis errors, which is significant (assuming, of course, that the method of fine analysis is more accurate than either of the methods used here). ANN/Drilling 4389.79 0.4577 0.9796 0.9207 0.7844 0.8705 2 /Drilling 4338.85 0.3754 0.4769 0.9447 0.8173 0.9649 ANN/2 3740.99 0.4908 0.8382 0.9131 0.7525 0.8816

rms

r

Teff log g log(nHe /nH ) Teff log g log(nHe /nH )

Table 2.3: Results of parameterising the 60 calibration stars. SFIT was applied to the 60 calibration stars and the results are listed in the second column of Table 2.3. The actual parameters obtained are listed in App endix A. SFIT compares well with the neural network in Teff and log g, but gives slightly b etter performance in log(nHe /nH ). A direct comparison b etween the neural network and SFIT's results is given in the On the Automatic Analysis of Stellar Sp ectra

46

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra

80 ANN/2 Teff Parameterisations (kK)

60

40

20

0 0 8 ANN/2 log g Parameterisations 7 6 5 4 3 2 2 ANN/2 log( nHe / nH ) Parameterisations 4 2 0 -2 -4 -6 3 4 5 6 Drilling log g Calibrations 7 8 20 40 60 Drilling Teff Calibrations (kK) 80

-6

-4

-2

0

2

4

Drilling log( nHe / nH ) Calibrations

Figure 2.3: Parameterisations of the 60 calibration stars. Results from each method have b een combined onto each plot. ANN results are indicated by blue crosses, and 2 minimiser results by red pluses.

2.2 Physical Parameters

47

third column of Table 2.3. The disagreement b etween the neural network models and SFIT is of similar degree as the disagreement of each method with the Drilling et al. parameters. The
rms

values in column three of the table are still within twice the

quoted errors for the fine analyses of the 60 calibration stars, which is a significant result (again, assuming that fine analysis is the more accurate method) and confirms that ANNs have the p otential of b eing able to parameterise hot sub dwarf sp ectra to a similar degree of accuracy as the more traditional method of 2 minimisation. The p oor generalisation of the neural network in the log(nHe /nH ) parameter is a significant issue, and requires further investigation.

Application Set 2: 133 Unparameterised Stars

The two ANN committees were applied to the remaining 133 unparameterised stars in the sample. These stars were also parameterised using SFIT. The parameters obtained from b oth methods are listed in App endix A. A direct comparison b etween the two methods was made. The results are presented in Table 2.4, and Figure 2.4. For approximately twice as many stars, the
rms

values

are only slightly worse than the values in the last column of Table 2.3. Tentatively sp eaking, the results could still b e considered to supp ort the view that ANNs have the p otential of b eing able to parameterise hot sub dwarf sp ectra to a similar degree of accuracy as 2 minimisers. As has b een p ointed out previously, the neural network models seem to b e suffering from regularisation issues when training on synthetic sp ectra. With further investigation on this matter, a significant imrp ovement in the neural network's generalisation performance could be obtained.

On the Automatic Analysis of Stellar Sp ectra

48

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra

2 Teff Parameterisations (kK)

60

40

20

0 0 7
2 log g Parameterisations

20 40 60 ANN Teff Parameterisations (kK)

6 5 4 3 2 2 3 4 5 6 ANN log g Parameterisations 7

2 log( nHe / nH ) Parameterisations

4 2 0 -2 -4 -6

-6

-4

-2

0

2

4

ANN log( nHe / nH ) Parameterisations

Figure 2.4: Parameterisations of the 133 unparameterised stars using the ANNs and 2 minimiser. Also shown is the b est-fit linear least squares line.

2.3 Summary ANN/2 5768.74 0.6853 0.9926 0.8850 0.8003 0.8875

49

rms

r

Teff log g log(nHe /nH ) Teff log g log(nHe /nH )

Table 2.4: A comparison b etween ANNs and 2 minimisation for parameterising the 133 unparameterised stars.

2.3

Summary

Artificial neural networks are a fast, and p owerful method for automatically classifying astronomical sp ectra. A feed-forward neural network configured in a 901:5:5:3 architecture, and trained for 500 ep ochs, was able to classify hot sub dwarf sp ectra onto the Drilling et al. (2006) scale with global errors (
rms

) of 2 sub-typ es for sp ectral typ e,

1 sub-class for luminosity class, and 4 sub-classes for the helium class. This was the most accurate ANN discovered for the task. The use of ANNs for obtaining physical parameters from stellar sp ectra offers the possibility of having a fast method for deriving initial parameter estimates. However, establishing the optimal network architecture to accurately model the flux-space to physical parameter-space mapping function was found to b e cumb ersome with much exp erimentation required. It was also discovered that attempting to train the neural network model on infinite S/N synthetic sp ectra led to over-fitting due to insufficient regularisation. A solution was attempted by the addition of noise to the training set, but further investigation here is needed. 2 methods are therefore more desirable for parameterising astronomical sp ectra in a general data mining tool kit as they offer more flexibility and greater ease of use than ANNs. Of course, these qualities come with the price of slower sp eed, with 2 methods unable to comp ete with ANNs in this regard. This issue is discussed further in the next Chapter. However, if the regularisation issues with parameterising ANNs can b e solved, On the Automatic Analysis of Stellar Sp ectra

50

Chapter 2 - On the Automatic Analysis of Stellar Sp ectra

their extremely fast application sp eed would instantly make them the preferred tool.

Chapter 3

Parameterisation - 2 Fitting
3.1 Analysing Stellar Sp ectra

Deriving physical parameters (i.e., Teff , log g, abundances) for a star is done by a fine analysis of its sp ectrum. The traditional method of sp ectroscopic fine analysis is a long, iterative process requiring several months to complete. The method is based on measuring equivalent widths of sp ectroscopic lines. The astronomer must go through a sp ectrum and manually identify as many sp ectral lines as p ossible, and the ions to which they b elong. Microturbulent and rotational velocities are first determined. Then, an initial grid of model atmospheres is calculated to cover the approximate Teff , log g, and comp osition of the star. Using these models, the theoretical equivalent widths are calculated for each of the identified ion lines in the star over the range of elemental abundances in the grid. These equivalent widths are combined to form curves of growth which can then b e used to read off derived abundances for each of the measured ion line equivalent widths in the stellar sp ectrum. Temp erature and surface gravity are determined by using the derived abundances of lines known to b e sensitive to temp erature (e.g., Fe) and gravity (e.g., H or He), and 51

52

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

performing a process of comparison and line fitting with each of the models in the grid. The derived values of Teff and log g are then used, along with the measured equivalent widths, to calculate new abundances. A new grid of model atmospheres is computed with these parameters. The entire analysis process of determining curves of growth, deriving values of Teff , log g, and abundances, and recomputing the model grid is rep eated until the derived parameters agree with those used in the models (i.e., convergence is achieved). An excellent description of this process, and demonstration of its application, can be found in Dudley (1992).

Progress Towards Automation

Given the iterative nature of the method of fine analysis, and the time required to conduct an analysis for a single star, attempts have b een made to find automated procedures for accomplishing the same goal much more quickly. Hutchison (1971) presents an automatic procedure for detecting sp ectral features and determining accurate line frequencies, line depths, and equivalent widths for highresolution infrared sp ectra. Morossi & Crivellari (1980) describ e a method to obtain Teff and log g by comparing observations to a grid of models. Their method is based on a least-squares minimisation procedure which determines values for the parameters which optimise the fit b etween the theoretical models and observational data. Katz et al. (1998) use a 2 minimisation procedure to obtain values of Teff , log g, and [Fe/H] from ELODIE sp ectra by fitting observations to a library of 211 reference stars observed with the same instrument for which the atmospheric parameters are well-known. The method of 2 fitting has grown to b e very much the de facto procedure of

3.1 Analysing Stellar Sp ectra

53

automating the parameterisation of astronomical sp ectra. It is a sp ecific case of the more general class of fitting procedures known as metric distance minimisation (or minimum distance methods ), where, as the name suggests, results are determined by minimising some distance metric b etween the ob ject under analysis and each memb er of a set of templates. The ob ject is assigned the parameters of the template which gives the smallest distance. Let x = (x1 ,x2 ,... ,xN ) b e the sp ectrum to parameterise, and s = (s1 ,s2 ,... ,sN ) be a template spectrum with known physical parameters. The distance metric to b e minimised is of the form

1 D= N

i=N i=1

1/p

wi |xi - si |

p

,

(3.1)

where wi is a weight assigned to flux element si of the template sp ectrum. Typically, s is only one template in a set of templates, S = {s1 , s2 ,... , sM }, and equation 3.1 is computed for all templates sj . Equation 3.1 b ecomes 2 fitting when p = 2 and wi =
-2 i

, where i is the error in xi .

For a straightforward nearest neighb our minimisation of the 2 metric over a grid of templates, an accurate result requires the grid to b e finely spaced in each parameter of interest so that the effects of that parameter on the flux vector can b e ascertained. This can create a large data requirement, and the computation timerequired to parameterise one sp ectrum can increase prohibitively as equation 3.1 must b e evaluated for all the templates. One solution to this problem is to use some method of interp olation to "fill in the gaps" b etween templates in a discrete grid. As interp olation creates the illusion of continuity in the grid, it also op ens the p ossibility of using search-based optimisation methods to locate the minimum of D in an efficient manner. Unfortunately, with 2 fitting, there is no escaping the so-called curse of dimen-

On the Automatic Analysis of Stellar Sp ectra

54

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

sionality. As the numb er of parameters to b e determined increases, the numb er of templates in the grid also increases exponential ly.

2 Fitting for Astronomical Data Mining
The main disadvantage to using 2 fitting in the context of a data mining application is slowness. In contrast to artificial neural networks, no training procedure exists to extract information from the template grid, and the grid is required to b e present and searched to minimise D for every new sp ectrum to b e parameterised. One solution to the sp eed issue is to distribute the grid search over many computers in a parallel cluster. The grid of templates could b e broken up into N sections, where N is the numb er of processing nodes in the cluster. Each node then receives its section of the grid, finds the local minimum of D for an observed sp ectrum, and rep orts this value back to a master processing node which then selects the global minimum from all the node results. In a data mining context, it is likely that the template grid will cover a large region of the parameter space of interest in reasonable detail so as to account for the diversity of ob jects that will b e encountered. Large template grids p ose storage and accessing problems from within the 2 minimisation program b ecause the main memory of the computer may not b e capacious enough to store all the templates at once. The work of this chapter is concerned with taking a pre-existing 2 minimisation program used at the Armagh Observatory, and b eginning the modifications necessary in order to use the program more efficiently in a data mining context. Parallelising the program is a relatively straightforward task, however the problem of managing large template grids is much more involved and needs to b e tackled first.

3.2 SFIT

55

3.2

SFIT

SFIT (Jeffery et al., 2001) is a Fortran 90 implementation of the 2 minimisation method outlined in the previous section. Given a grid of theoretical model sp ectra, and an observed sp ectrum, SFIT finds the combination of physical parameters of the model which most closely matches the observed sp ectrum by minimising the 2 distance metric. The program considers several broadening processes which must b e applied to the theoretical sp ectra b efore comparison with an observed sp ectrum. These include instrumental broadening I(), rotational broadening V(v sin i, ), acceleration broadening A( v ), and pro jection broadening P(v - v ). Ї Model grids are discrete in three-dimensions: Teff , log g, and n
atm

, the fractional

atmospheric abundance of an element. A linear interp olation in tables method is used to estimate the model space b etween grid p oints. Fitting solutions can b e obtained in several parameters (Teff , log g, n
atm

, v sin i, and vrad ) for b oth single and comp os-

ite sp ectra. The 2 minimisation can b e carried out using either the Nelder-Mead downhill simplex optimisation procedure, implemented as a variant of the AMOEBA algorithm of Press et al. (1986), or the Levenb erg-Marquardt algorithm (Levenb erg, 1944; Marquardt, 1963). Nearest neighb our 2 fitting is also p ossible. The Amoeba algorithm minimises a function (in this case, the 2 difference b etween the observed sp ectrum and the models in the grid) by defining an initial simplex with N +1 vertices, where N is the numb er of dimensions in the function's parameter space. The method then takes a series of steps, most of which just move the p oint of the simplex where the function to b e minimised is largest through the opp osite face of the simplex to a lower p oint. The simplex "moves" through the parameter space by contracting and expanding until the distance "moved" is smaller than some tolerance threshold, at which p oint the method is determined to have converged on a solution. The Levenb erg-Marquardt method is an interative method sp ecifically catering for On the Automatic Analysis of Stellar Sp ectra

56

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

the minimisation of sum-of-squares error functions (i.e., the 2 function used in SFIT). The algorithm expands the error function around a p oint and examines the derivatives to search for a minimum by dynamically setting the step size according to the direction of the gradient. As the solution approaches the minimum, the step size decreases, and the algorithm usually converges quickly. The use of this method in SFIT requires the initial guess for the parameters to b e reasonably close to the solution as Levenb ergMarquardt can get trapp ed in local minima. Although slower, Amoeba is more robust against this p ossibility. Both methods assume that the error function is either continuous, or can b e evaluated for any p oint within the b oundaries of the parameter space. Evaluation of the
2

error function in SFIT dep ends up on a grid of models which is discrete. As mentioned in the previous section, an interp olation method is used to "fill in the gaps" of the grid, thereby creating the illusion of a continuous parameter space. As the Amoeba or Levenb erg-Marquardt optimisers examine the prop erties of the error function throughout the parameter space, they are more often than not examining an interp olation of the model sp ectra. Once Teff , log g, and v sin i have b een determined, SFIT can estimate the composition of a star by adjusting the abundances of the different atomic sp ecies which contribute to the absorption sp ectrum until the theoretical sp ectrum matches the observed sp ectrum. As the numb er of free parameters in such an analysis is so large (i.e., the abundances of H, C, N, O, Al, and so on, along with the microturbulent velocity, vt ), pre-computing multidimensional grids of theoretical sp ectra is infeasible. SFIT solves this problem by computing synthetic sp ectra as demanded by the 2 minimisation algorithm. SFIT is currently distributed with STERNE and SPECTRUM, the model atmosphere and sp ectral synthesis codes used at the Armagh Observatory. As part of this thesis, the source codes for all three programs, and their associated libraries, were ported from a simple build system based on GNU make to a more flexible build system

3.2 SFIT based on the GNU autotools (see App endix E).

57

3.2.1

Limitations of SFIT

Analyses p erformed with SFIT are hindered by the program's restrictions on the size of the model grid. Grids are limited to three dimensions (Teff , log g, and n maximum, nine p oints in Teff , five in log g, and five in n have no more than five thousand wavelength p oints. These limits are due to design decisions made during SFIT's inital construction which choose to store the model grid entirely in the computer's main memory. The restrictions on dimensionality and numb er of grid p oints aremerely hard-coded numb ers within the program, and therefore cannot b e changed without recompiling the source codes. Storing the model grid in main memory, whilst providing fast access to the models, also presents a problem in that computer memory is finite - orders of magnitude more finite than the space available on secondary storage devices, such as hard disks. Despite ever increasing main memory capacities, the implied upp er limit on the numb er of models and their detail will always b e much smaller than if secondary storage was used. Another restriction SFIT places on the model grid is that it must b e rectangular and complete, with no missing grid p oints. This is problematic b ecause it may b e difficult or imp ossible for model atmosphere simulations to converge for a given set of physical parameters. In such an instance, a make-shift solution is employed wherein a converged model close to the desired physical parameters is used to "plug the gap". The rectangularity and completeness requirements are a result of SFIT's interp olation scheme which generates approximations of models in the parameter space b etween discrete grid p oints by linear interp olation in tables. An irregular grid, or a missing
atm atm

), and, at

. Models are p ermitted to

On the Automatic Analysis of Stellar Sp ectra

58

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

grid p oint, prevents the interp olation scheme from op erating correctly.

3.2.2

Proposal to Remove SFIT's Limitatons

In summary, the limitations of SFIT's treatment of model grids are

1. Size limitations due to initial program design decisions and storage of grids in main memory. 2. Interp olation scheme cannot handle irregular or incomplete grids.

Modifying SFIT to b e more useful in a data mining context requires removing these two limitations. The solution to the first limitation is obvious: correct the limiting initial design decisions, and store the model grids on secondary storage, i.e. hard disk, reading them into main memory on an individual basis only when needed. An indexing scheme is then required, one that can b e held in main memory in place of the models, and quickly searched to determine which models are to b e read in and their location on disk. The nature of this index is dep endent on the interp olation scheme chosen to correct the second SFIT limitation. Interp olation allows a complicated function to b e approximated at an unknown p oint by using known surrounding points to construct a simpler, estimating function. Different interp olation schemes use the known surrounding p oints in different ways, so the design and function of the prop osed model grid indexing scheme must b e tailored accordingly. Many interp olation schemes exist in the literature. The ideal scheme for this application should b e multidimensional (although this can b e relaxed due to the curse of dimensionality ), have low computation cost, and b e able to op erate over p otentially randomly sampled functions. The interp olating function must also b e continuous, and be based on known data points local to the interpolation point (as opp osed to global

3.2 SFIT methods, in which the interp olated value is influenced by all of the available data).

59

Two interp olation functions which stand out in terms of their simplicity, multidimensionality, and ability to handle incomplete grids are weighted average interpolation and simplex interpolation.

Weighted Average Interpolation

The most common weighted average method referred to in the literature is that of Shepard (1968) and its modifications, such as Renka (1988). Given an underlying function, f , with values fi at nodes (xi ,yi ) for i = 1,... ,N , the interp olating formula is of the form,

F (x, y ) =

N k =1

Wk (x, y )fk (x, y )
N i=1

Wi (x, y )

.

(3.2)

The weighting function, Wk , is defined by some inverse distance function,

Wk (x, y ) =

1 , d2 k

(3.3)

where dk (x, y ) denotes the Euclidean distance b etween (x, y ) and (xk ,yk ). A suitable indexing scheme for weighted average interp olation should allow fast searching of the node p oints to determine which are within a specified radius of the interp olation p oint (i.e., nearest neighb our searching). The field of computational geometry contains many algorithms and data structures for indexing and searching a set of N -dimensional p oints in a computationally efficient manner. One data structure that is very applicable to nearest neighb our searching problems is the k-D tree (Moore, 1991). Figure 3.1 demonstrates the k-D tree in two dimensions. On the Automatic Analysis of Stellar Sp ectra

60

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

[4,9] [4,9]

[8,7] [2,5] [2,5] [8,7]

[3,2]

[3,2]

Figure 3.1: Example of a k-D tree in two dimensions. On the left is the representation of how the k-D tree on the right splits up the x, y plane. (Adapted from Moore 1991.)

This data structure is a binary tree which represents a series of partitions in kdimensional space, organising a set of p oints into a collection of hyp er-rectangular regions. Nearest neighb our searching can b e carried out in O(log2 N ) time on average, where N is the numb er of nodes in the tree. All that remains is to determine how many nearest neighb ours are needed, and the weighted average interp olation can b e p erformed immediately.

Simplex Interpolation

A simplex, or N -simplex, is the N -D analogue of a triangle in 2-D and a tetrahedron in 3-D, as demonstrated in Figure 3.2. Simplex-based interp olation uses a weighted linear combination of the simplex vertices to approximate a function at a p oint located on or within the simplex b oundary. These weights are computed as the barycentric coordinates of the interp olation p oint within the simplex. Given a collection of N -dimensional p oints, such as a grid of model sp ectra, a suit-

3.2 SFIT

61

(a) 1-simplex

(b) 2-simplex

(c) 3-simplex

Figure 3.2: A 1-simplex is a line segment. A 2-simplex is a triangle. A 3-simplex is a tetrahedron. able indexing scheme must allow the vertices of the enclosing N -simplex to b e located quickly. As this is, again, another nearest neighb our problem, the method of k-D trees could be a viable solution. However, if the dimensionality of the grid is kept to three dimensions or less, the field of computational geometry offers another approach. Several algorithms exist which can take a cloud of two or three-dimensional p oints and generate a triangular or tetrahedral mesh. All that is then needed is a method to search the mesh for the triangle or tetrahedron that contains the interp olation p oint.

Choosing the Solution

Preliminary testing of b oth interp olation and indexing schemes was carried out to help determine which solution would b e more viable. Constructing a suitable prototyp e of the weighted average/k-D tree solution was hindered by Fortran 90's insufficient flexibility to supp ort the implementation of advanced data structures. No suitable third-party libraries were available to sp eed development, and, as a result of time constraints, the pursuit of this solution had to b e abandoned. On the other hand, if it is assumed that SFIT model grids are limited to three dimensions, then several freely available third-party libraries exist which can generate tetrahedral meshes from a cloud of random p oints. From a purely pragmatic standOn the Automatic Analysis of Stellar Sp ectra

62

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

point, this makes the simplex interpolation scheme very attractive. After the mesh has been generated, the methods then required to search for the tetrahedron enclosing an interp olation p oint are simple geometric op erations. Thus, the simplex interp olation method was chosen to solve SFIT's grid management problems. The weighted-average/k-D tree solution is an interesting idea (which, unlike the simplex scheme, is not limited to three dimensions), and should b e pursued in future work.

3.3

Tetrahedralisation: Interp olation and Indexing

In developing the simplex interp olation and corresp onding grid indexing scheme, it was assumed that SFIT grids will always b e three dimensional due to the curse of dimensionality. From this assumption, the tetrahedral mesh indexing scheme, describ ed previously, can b e constructed using third-party libraries. This affords a very pragmatic solution to the problem.

3.3.1

Simplex Interpolation

Barycentric coordinates express the location of any p oint within an N -simplex in terms of a set of homogenous coordinates that form a linear combination of the simplex vertices. Given a tetrahedron defined by three arbitrary vertices, v1 , v2 , v3 , and v4 , and some p oint p within this tetrahedron, p can b e expressed as the weighted combination of the four vertices

p = 1 v1 + 2 v2 + 3 v3 + 4 v4 ,

(3.4)

where 1 , 2 , 3 , and 4 are the barycentric coordinates. These are sub ject to the

3.3 Tetrahedralisation: Interp olation and Indexing constraints that

63

0 1 , 2 , 3 , 4 1, and,

(3.5)

1 + 2 + 3 + 4 = 1.

(3.6)

Calculating the barycentric coordinates of a p oint inside a given tetrahedron is accomplished by reformulating equation 3.4 as follows,

x py v1y = pz v1z 1 1 or, rewriting in matrix notation,

p

v1x v2x v3x v4x v2y v2z 1 v3y v3z 1 v4y v4z 1

1 2 · , 3 4

(3.7)

b = A · x, where b =
px py pz 1 T

(3.8)

,x=

1 2 3 4

T

, and,

v v v v 1x 2x 3x 4x v1y v2y v3y v4y A= v1z v2z v3z v4z 1 1 1 1

.

Therefore, x can b e found through the standard methods of solving equation 3.8. As will b e useful later on, if the computed barycentric coordinates do not conform On the Automatic Analysis of Stellar Sp ectra

64

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

to the constraints discussed earlier, then the p oint of interest can b e determined to lie outside the given tetrahedron.

3.3.2

Grid Index - Delaunay Triangulation

The Delaunay triangulation (O'Rourke, 1998) is frequently used to generate meshes of N -simplices from a set of N -dimensional p oints b ecause it has certain desirable prop erties, the most imp ortant of which is the following: inside the circum-hyp ersphere of any simplex, there are no other p oints of the set (see Figure 3.3). This prop erty yields a resulting triangulation which is "natural" and provably optimal in many resp ects. It is known that the Delaunay triangulation exists and is unique for a set of p oints in general p osition, that is, no N +1 p oints are on the same hyp erplane and no N +2 p oints are on the same hyp ersphere, for an N -dimensional set of points. In the context of SFIT, the Delaunay tetrahedralisation of a model grid is generated by the third-party library TetGen1 . TetGen is a p ortable C++ program implementing the Delaunay triangulation algorithm of Edelsbrunner & Shah (1992). This algorithm is simple, fast, and TetGen's implementation is numerically robust due to the use of adaptive exact arithmetic code (Shewchuk, 1996). TetGen can b e compiled as a set of library functions which can then be integrated into other applications, in this case, SFIT. A technical difficulty arises in that SFIT is a Fortran 90 program, but TetGen is written in C++. Unfortunately, the Fortran 90 standard does not provide for calling functions written in other programming languages, and it has b een left up to the individual compiler implementors to include a solution. SFIT is currently based around the Intel Fortran compiler for Linux2 , and it is rel1 2

http://tetgen.b erlios.de http://www.intel.com/cd/software/products/asmo-na/eng/compilers/flin/

3.3 Tetrahedralisation: Interp olation and Indexing

65

Figure 3.3: In two dimensions, the Delaunay triangulation guarantees that no other points lie in the circumcircle of any simplex.

atively straightforward to call out to C/C++ functions using the mechanisms provided by this compiler. To simplify the process of calling the TetGen library from Fortran, a small "glue" function was written in C. This function accepts a flattened array of three-dimensional model grid p oints, copies the data into the data structure used by TetGen, calls TetGen to p erform the tetrahedralisation, then returns a flattened array of vertices for the generated tetrahedra, and a flattened array denoting the neighb ouring tetrahedra for each generated tetrahedron. This process of calling TetGen to construct the new model grid indexing scheme fits in with SFIT's normal grid generation procedure, as outlined in algorithm 1. As noted in the pseudo-code, the parameters of the models are rescaled b efore b eing On the Automatic Analysis of Stellar Sp ectra

66

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

Algorithm 1 Generating a Tetrahedralisation of a Model Grid for all models in the grid file list do read model parameters from header record each parameter in corresp onding grid axis array write model fluxes to direct access grid file app end model parameters and corresp onding direct access file record numb ers to a linked list end for rescale parameters in linked list to yield more optimal tetrahedra flatten the parameters in linked list to an array of 3D p oints pass array of p oints into TetGen {TetGen returns two arrays: a list of tetrahedra vertices, and a list of tetrahedra neighb ours} wr wr wr wr ite ite ite ite grid axis arrays to b eginning of index file linked list of model data to index file list of tetrahedra vertices to index file list of tetrahedra neighb ours to index file

passed to TetGen. This is to allow the generation of a mesh which is comp osed, more optimally, of "fat" tetrahedra, avoiding degenerate tetrahedra or "slivers" which would cause numerical problems for the simplex interp olation scheme and the p oint location algorithm outlined in the next section. Such degenerate tetrahedra would arise b ecause of a scale disparity b etween the model grid axes. For instance, the Teff axis contains effective temp eratues measured in Kelvin and rescaled in magnitude by a division by 100. On the other hand, the n
atm

axis typically contains fractional values 0 n
atm

atm

1.

This disparity means that model grids are very compact in the n comparatively widely spaced in the Teff dimension.

dimension, and

Given the model grid axis arrays accumulated during the model grid creation process (which typically corresp ond to the dimensions of Teff , log g, and n rescaled in the following manner.
atm

), each axis is

3.3 Tetrahedralisation: Interp olation and Indexing

67

Let Ai be the ith model grid axis comprising the list of m monotonically increasing points {ai1 ,ai2 ,... ,aim }. Ai is rescaled according to the mapping function f : Ai R such that f (a) for every a Ai is defined as a - ai1 100. ai2 - ai1
i

f (a) =

(3.9)

This simple function translates Ai to the origin, and rescales the p oints onto a more widely spaced grid. Assuming a constant distance b etween all p oints aii , this mapping yields a list of m monotonically increasing p oints Ri , {0, 100, 200, ··· , (m - 1) 100}.

3.3.3

Navigating the Index - Point Location

The algorithm for locating the tetrahedron which encloses any given interp olation p oint is based on a randomised jump-and-walk methodology, inspired by the work of Mucke Ё et al. (1996). The basic idea is simple. A "good starting p oint" is established by randomly sampling the set of tetrahedra. The distances b etween each tetrahedron's centroid and the given interp olation p oint are calculated, and the tetrahedron closest to the interp olation point is selected. A line segment is then constructed using the chosen tetrahedron's centroid and the interp olation p oint. The tetrahedron containing the interpolation point is located by "walking through" the tetrahedra which intersect this line. Figure 3.4 illustrates the concept in two dimensions More formally, given the tetrahedralisation D of a model grid containing n tetrahedra, and an interp olation p oint p (rescaled using Equation 3.9), the following procedure locates the tetrahedron of D , if any, which contains p: 1. Select m tetrahedra T1 , ··· ,Tm at random from D , where m = 2 n
1 3

On the Automatic Analysis of Stellar Sp ectra

68

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

p

L

T

Figure 3.4: The line segment, L, is constructed using the centroid of the starting tetrahedron, T , and the interp olation p oint, p. The tetrahedra visited on the walkthrough are coloured grey. 2. Determine the index j {1, ··· ,m} of the tetrahedron minimising the Euclidian distance d(centr oid(Tj ), p). Set T = Tj 3. Locate the tetrahedron containing p (if it exists) by traversing all tetrahedra intersected by the line segment L = (centr oid(T ), p).

Step 3 is implemented in constant time p er tetrahedron visited once the initial tetrahedron, intersected by L and incident on starting p oint T , is determined. This is due to the fact that TetGen conveniently returns an array which describ es, for every tetrahedron in the mesh, which tetrahedra are its neighb ours. The implementation of the walk-though mechanism is based on the fast ray-triangle intersection algorithm of MЁller & Trumb ore (1997). This algorithm is very straighto

3.3 Tetrahedralisation: Interp olation and Indexing forward. A ray R(t) with origin O and normalised direction D is defined as

69

R(t) = O + tD,

(3.10)

and a triangle is defined by three vertices V0 , V1 , and V2 . A p oint, T (u, v ), on a triangle is given by

T (u, v ) = (1 - u - v )V0 + uV1 + v V2 ,

(3.11)

where (u, v ) are the barycentric coordinates which must fulfill u, v 0, and u + v 1. Computing the intersection b etween the ray, R(t), and the triangle, T (u, v ), is equivalent to R(t) = T (u, v ), which yields

O + tD = (1 - u - v )V0 + uV1 + v V2 . Rearranging the terms gives

(3.12)

-D, V1 - V0 , V2 - V

0

t · u = O - V0 . v

(3.13)

The barycentric coordinates (u, v ) and the distance, t, from the ray origin to the intersection p oint can b e found by solving the linear system of equations ab ove. If the barycentric coordinates meet the requirements stipulated earlier, then the ray intersects the triangle. From the starting p oint of the walk-through method, each triangular face of the tetrahedron is tested using this algorithm to determine if it is intersected by the line On the Automatic Analysis of Stellar Sp ectra

70

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

segment L. If an intersecting face is found, the walk-through moves to the tetrahedron opp osite that face (in constant time). This new tetrahedron is first tested to see if it contains p oint p by way of the simplex interp olation method discussed in section 3.3.1. If the tetrahedron does not contain p, the ray-triangle intersection test is p erformed, and the walk-through moves to the neighb ouring tetrahedron on the other side of the face intersected by the ray. If the tetrahedron does contain p, then the walk-through procedure can terminate successfully by returning the interp olation weights (i.e., the barycentric coordinates) obtained from the p oint-in-simplex test. It is p ossible that p oint p could lie outside the convex hull of the tetrahedralisation. The walk-through algorithm recognises this eventuality when the line segment L intersects the face of a tetrahedron which is a memb er of the convex hull and therefore has no neighb our listed in the array returned by TetGen. Rather than allowing the walk-through algorithm to sp end time traversing the tetrahedralisation in order to discover that p oint p lies outside the convex hull, it is p ossible to test for this case immediately after forming the line segment L. In addition to generating the Delaunay tetrahedralisation of a model grid, TetGen is also able to return a list of those tetrahedron faces which comprise the convex hull. After forming the line segment L, each of these faces could then b e tested for intersection. However, it doesn't really matter which method is used b ecause, if p oint p lies outside the convex hull, the simplex interp olation method dictates that SFIT can no longer proceed with a fitting run and must stop. In summary, pseudo-code for the algorithms outlined in this section are given in algorithms 2, 3, and 4.

3.3 Tetrahedralisation: Interp olation and Indexing Algorithm 2 Locating a Point in a Tetrahedralisation rescale p oint, p, onto axes of rescaled model grid if no starting tetrahedron exists then find close starting tetrahedron by random selection end if walk through tetrahedralisation if enclosing tetrahedron found then return barycentric coordinates of p oint p within the tetrahedron else point lies outside the convex hull of the tetrahedralisation exit SFIT end if Algorithm 3 Finding Walk-Through Starting Point

71

select at random m = 2 n 3 tetrahedra from the tetrahedralisation, where n is the total numb er of tetrahedra in the tetrahedralisation compute the Euclidean distance from each selected tetrahedron's centroid to the interp olation p oint return the index of the closest tetrahedron Algorithm 4 Walk-Through of Tetrahedralisation construct the line segment, L, from given starting tetrahedron's centroid to the interp olation p oint, p current tetrahedron = given starting tetrahedron loop if current tetrahedron contains the interp olation p oint then return the barycentric coordinates of its location else test each triangular face of the starting tetrahedron for intersection with L current tetrahedron = neighb ouring tetrahedron on other side of intersected face if current tetrahedron is null then interp olation p oint lies outside convex hull exit SFIT end if end if end loop

1

On the Automatic Analysis of Stellar Sp ectra

72

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

3.4

Testing the Mo difications

The new simplex interp olation and indexing scheme was tested against the previous SFIT grid storage and interp olation in tables method using two case studies. The first conducts an analysis of a sp ectrum from the extreme helium star BD+10 2179 (Klemola, 1961) to allow a comparison of each of the different optimisation routines offered by SFIT over the two interp olation schemes. The second uses a coarse grid of theoretical models to parameterise a large numb er of other models to give an indication of the accuracy of the two interp olation schemes whilst keeping the optimisation method constant.

Case Study 1: BD+10 2179

The observed high-resolution echelle sp ectrum of BD+10 2179 used in this study covers the wavelength range 37605230 ° at a disp ersion of 0.1° ixel-1 . The sp ectrum A, Ap has already b een wavelength calibrated and normalised. Both versions of SFIT fit a window, 40544545 ° of this sp ectrum to a grid of 48 theoretical models covering the A, parameter space as describ ed in Table 3.1. Parameter Teff (K) log g nHe Values 14,000, 16,000, 18,000, 20,000 2.00, 2.50, 3.00, 3.50 0.9960, 0.9890, 0.9690

Table 3.1: Details of the model grid used in the comparison .

The grid also has a latent fourth parameter in carb on abundance. For each analysis, the same initial guesses for each parameter were used for the Amoeba and Levenb erg-Marquardt optimisation methods. These have b een chosen to be close to expected values of the final parameter. They, and the step sizes given to the Amoeba routine, are listed in Table 3.2.

3.4 Testing the Mo difications Parameter Teff (kK) log g (dex) nHe v sin i vrad Initial Value 17.0 2.5 0.989 27.5 137.4 Amoeba Step Size 2.0 0.5 0.01 10.0 10.0

73

Table 3.2: Initial parameters used for the Amoeba and Levenb erg-Marquardt optimisation routines. The step sizes used for Amoeba are also given . An analysis b egins by fixing the helium abundance, and solving for Teff , log g, v sin i, and vrad . Then, the values for these parameters are fixed, and a solution is found for n
He

which, with the latent n
He

C

parameter, is effectively a first approximation for nC .

Finally, the value of n

is fixed again, and the solutions for Teff and log g are checked.

The results obtained by each optimisation method available in SFIT (Nelder-Mead simplex (Amoeba), Levenb erg-Marquardt (LM), and nearest neighb our (NN) fitting) are presented in Tables 3.3 and 3.4 for b oth the original SFIT and the modified SFIT. Listed in parentheses for each parameter are the standard errors generated by SFIT. Unmo dified SFIT Amoeba LM 18.000 (±0.014) 18.087 (±0.016) 2.743 (±0.004) 2.747 (±0.004) 0.997 (±0.001) 0.994 (±0.001) 33.44 (±0.13) 36.9 (±0.15) 136.23 (±0.0) 137.4 (±0.0) 9.00 9.90 54.3 11.42

Teff (kK) log g nHe v sin i vrad 2 Fit Time (secs)

NN 18.00 (±0.011) 2.50 (±0.003) 0.996 (±0.001) 27.50 (±0.11) 137.4 (±0.0) 9.96 27.99

Table 3.3: Results of BD+10 2179 analysis with the unmodified version of SFIT This is a satisfactory result which shows that the simplex interp olation and grid indexing scheme p erforms slightly b etter (in terms of the final 2 value) than the original linear interp olation in tables method. There is also a small gain in terms of execution sp eed of the Amoeba method with the new simplex-based scheme. It should b e noted that the 6-fold increase in sp eed for nearest neighb our searching rep orted in Table 3.4 is due to a re-write of some SFIT internals to take advantage of On the Automatic Analysis of Stellar Sp ectra

74

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra Mo difie Amoeba 18.150 (±0.005) 2.836 (±0.004) 0.993 (±0.001) 33.46 (±0.13) 136.22 (±0.0) 8.88 20.649 d SFIT LM 17.870 (±0.015) 2.687 (±0.005) 0.992 (±0.001) 36.90 (±0.14) 137.4 (±0.0) 9.770 14.01

Teff (kK) log g nHe v sin i vrad 2 Fit Time (secs)

NN 18.00 (±0.012) 2.50 (±0.003) 0.996 (±0.001) 27.50 (±0.12) 137.4 (±0.0) 10.20 4.71

Table 3.4: Results of BD+10 2179 analysis with the modified version of SFIT the data structures used in the simplex interp olation scheme. The data structures allow a fast iteration over all the models in a grid, reading each in from disk as needed. This means that the 2 computation is b eing p erformed directly with the model itself, in contrast with the linear interp olation in tables scheme which actually tries to interp olate to the grid p oint instead of accessing the model directly. This is another design flaw in SFIT that the simplex-based scheme corrects. The difference in methodology also accounts for the slightly different 2 values for nearest neighb our fitting listed in Tables 3.3 and 3.4.

Case Study 2: Model-Based Analysis

The grid of theoretical models used in this case study is given in Table 3.5. It coarsely covers almost the entire parameter space of models availablein the Armagh Observatory archives. Parameter Teff (kK) log g nHe Values 15.0, 20.0, 25.0, 30.0, 35.0, 40.0, 50.0 3.00, 4.00, 5.00, 6.00 0.001, 0.1, 0.5, 0.9, 0.999

Table 3.5: The model grid used to obtain physical parameters of the set of test models. The rationale of the exp eriment is to use this grid to parameterise a large set of models which fall within its b oundaries, but aren't actually used in the grid. Keeping the optimisation method constant, the results of the parameterisations will give an

3.4 Testing the Mo difications indication of the relative accuracy of the two interp olation schemes.

75

1238 models were selected to b e parameterised by each version of SFIT. Each model was convolved with a Gaussian of 1 ° FWHM to degrade its resolution slightly, and A then resampled onto a wavelength grid of 40504950 ° A. The optimisation method used was Nelder-Mead simplex, with initial parameters and step sizes as follows: Teff = 30kK, Teff = 5.0kK; log g = 4.5, log g = 1.0; n 0.5, n
He He

=

= 0.1. Results are presented in Figures 3.5 to 3.7, and in Table 3.6,

Before discussing the results, the presence of some anomalies in the linear interp olation in tables parameterisations must b e noted and dealt with. Figure 3.5 plots the parameterisation results for all of the 1238 models. At Teff 50, 000K, the optimiser returns unb elievable values of log g for some models. Something also seems to b e going wrong with the Teff parameterisations at the 50,000K grid b oundary as some models with log g 3.5 are assigned temp eratures much larger than 50,000K. The implementation of the linear interp olation in tables method used in SFIT does not take any steps to limit the optimisation routines to the boundaries of the grid, and actually allows some extrap olation to occur at the edges of the grid. However, it is unclear whether the anomalous Teff and log g values are due to the optimisation routine (in this case, Amoeba) extrap olating too far outside the grid space (i.e., there is a problem with the implementation of the interp olation routine), or if there is a problem with the models. If Figure 3.5 is replotted with axes closer to the grid b oundaries, as in Figure 3.6, the best performance of the interpolation method appears to occur b elow Teff = 40,000K. Between 40,000K and 50,000K, the parameterisations are more randomly distributed indicating a greater level of "confusion" from the interp olation routine. A cursory insp ection of the models reveals no significant problems, so it could b e hyp othesised that there is definitely an issue with the implementation. However, an insp ection of Figure 3.7 shows a similar "confusion" from the simplex-based method.

On the Automatic Analysis of Stellar Sp ectra

76

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

-10 0 10 20 log g 30 40 50 60 70 80 80000 3 2 1 log( nHe / nH) 0 -1 -2 -3 -4 -5 70000 60000 50000 40000 Teff (K) 30000 20000 10000

80000

70000

60000

50000 40000 Teff (K)

30000

20000

10000

Figure 3.5: Parameterisation results from the linear interp olation in tables method. Clearly visible are anomalous results arising from a susp ected defect in the method's implementation.

3.4 Testing the Mo difications

77

2

3

4 log g 5 6 7 50000 2 40000 30000 Teff (K) 20000 10000 1 0 log( nHe / nH) -1 -2 -3 -4

50000

40000

30000 Teff (K)

20000

10000

Figure 3.6: Parameterisation results from the linear interp olation in tables method. Axes have b een restricted to give a view of the grid b oundaries describ ed in Table 3.5.

On the Automatic Analysis of Stellar Sp ectra

78

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

2

3

4 log g 5 6 7 50000 2 40000 30000 Teff (K) 20000 10000 1 0 log( nHe / nH) -1 -2 -3 -4

50000

40000

30000 Teff (K)

20000

10000

Figure 3.7: Parameterisation results from the simplex-based interp olation scheme. In contrast with Figures 3.5 and 3.6, the simplex-based scheme clearly restricts the optimisers to the grid b oundaries.

3.4 Testing the Mo difications

79

At Teff 40, 000K , the helium-rich models are most likely confusing the optimiser because the HeI I ion lines manifest at wavelengths close to those of the neutral hydrogen lines. This problem requires further investigation, but, to work around the issue, a comparison of parameterisation results for those models with Teff 40, 000K, and log g 6.0 is also given in Table 3.6. These RMS metrics give a b etter indication of the relative p erformance of the two methods.

Simplex/Models Linear/Models Linear/Simplex

All Teff (K) 3592.74 4695.11 3455.88

rms Models log g 0.329 3.362 3.376

nHe 0.102 0.149 0.150

rms Teff 40kK, log Teff (K) log g 2666.79 0.355 1905.47 0.349 1928.02 0.306

g 6.0 nHe 0.068 0.056 0.065

Table 3.6: RMS comparison of parameterisation method with the original parameters of each model. between the methods, and a comparison between the space for which b oth schemes seem to give their b est

results from each interp olation Also given is the RMS difference results in the region of parameter results (see Figures 3.6 and 3.7).

The linear interp olation in tables scheme yields slightly more accurate results than the simplex-based method for all three parameters. This is most likely due to the coarse grid spacing used in the exp eriment, with a finer-grained grid allowing the simplex interp olation method to achieve more accuracy. Using a finer-grained grid with SFIT is now p ossible b ecause the simplex-based gird management scheme removes all the grid size, shap e, and completeness restrictions imp osed by the old linear interp olation in tables method. The sp eed difference b etween the two methods should also b e emphasised. To parameterise all the models, SFIT took approximately 10 minutes with the simplex-based scheme, but over 90 minutes with the old methodology. This significant gain in sp eed, along with the other advantages offered by the new simplex-based scheme, outweigh the p ossible slight loss of accuracy indicated in Table 3.6. On the Automatic Analysis of Stellar Sp ectra

80

Chapter 3 - On the Automatic Analysis of Stellar Sp ectra

3.5

Summary

The 2 fitting code, SFIT, has b een modified and extended to handle arbitrarily large grids of theoretical model sp ectra. This paves the way to making SFIT more amenable to parameterising very large quantities of stellar sp ectra in an astronomical data mining application. Two ma jor problems were identified with the way SFIT manages grids of models. Grids were restricted in size due to hard-coded limits written into the program, and the interp olation scheme used to approximate the space b etween grid p oints could not handle irregular or incomplete grids. These problems were solved by developing a new grid management and interp olation scheme based on simplex interp olation and Delaunay triangulation. This new scheme was tested against the old version of SFIT by parameterising a wellstudied sp ectrum, and a large quantity of theoretical models. The new version of SFIT was found to p erform much faster than the old version, with a more accurate fit b eing rep orted for the individual sp ectrum, and slightly (but not significantly) worse results in the parameterisation of the models. This slight loss of accuracy is outweighed by the increase in overall sp eed, and the removal of several severely constricting restrictions on the size, shap e, and completeness of SFIT model grids.

Chapter 4

Filtering - Principal Components Analysis
Modern astronomical data sets often contain observations of many different typ es of ob jects, and are rarely typ ologically homogeneous (Chapter 1). Searching for particular typ es of ob jects in such large databases requires computer assistance. Query parameters can b e used to narrow down the data set to ob jects of a particular colour range, redshift, morphology, or some other parameter combination of significance. However, this reduced data set will invariably still contain ob jects that the astronomer would like to discard. Manual insp ection of the data is not time-efficient unless quantities are small. It is more exp edient to have an automated, or semi-automated, tool that can b e used to assist in filtering through the data. Filtering is essentially a coarse-grained classification problem. An unknown sp ectrum is compared with a collection of known, or template, sp ectra to determine if it belongs to that particular class of ob ject. The well-known techniques of cross correlation (Tonry & Davis, 1979) and 2 minimisation (Chapter 3) are immediately applicable. However, in a data mining context, sp eed is of imp ortance, and these techniques are slow.

81

82

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra One way to construct a fast filtering method is to extract the defining features from

a set of known sp ectra, and use them to summarise and represent that set. Instead of comparing an unknown sp ectrum with each template sp ectrum, it can then b e weighed against the summarised form in a more computationally exp edient manner. Principal Comp onents Analysis (PCA; Murtagh & Heck, 1987) can b e used to construct such a summary. It is a multivariate statistical technique which seeks to summarise the variance of an N -dimensional data set in a handful of indep endent parameters. These parameters capture the main sources of linear variation in the data set, and can b e used to construct a fast test to determine if an unknown sp ectrum is similar to a collection of known sp ectra. Another advantage to using PCA as a filter is that the indep endent parameters produced are unique to each data set. This means that a PCA-based filter is generalisable, and can b e used to construct a filter for any typ e of astronomical ob ject. As a testament to its versatility, PCA has b een applied on several occasions to the classification of astronomical sp ectra. Deeming (1964) applied it to the classification of G and K-typ e giants. Connolly et al. (1995) used PCA to classify galaxy sp ectra, and Francis et al. (1992) applied it to the classification of quasar sp ectra. Whereas these studies used PCA in an unsupervised manner, it is used here in a supervised fashion.

A Filter for Hot Subdwarfs

Chapter 1 outlines a general data mining toolkit for astronomical sp ectra, with a sp ecific application to hot sub dwarfs. As such, the apparatus of the PCA-based filter outlined here will b e applied to the data set obtained from Drilling et al. (2006) to construct a filter for hot sub dwarfs. The op eration of this filter will then b e applied to a set of real-world low-disp ersion sp ectra obtained from the Sloan Digital Sky Survey in an attempt to data mine a

4.1 Constructing A PCA-Based Filter

83

Y

u2

u1

X
Figure 4.1: Principal comp onent analysis. u1 is the first principal comp onent and the axis onto which the pro jected p ositions of the data have their maximum sum. u2 is the second principal comp onent, and u1 · u2 = 0. collection of hot sub dwarf candidates for further study.

4.1

Constructing A PCA-Based Filter

Principal comp onents analysis transforms an N -dimensional data set onto a new set of optimally defined axes. These axes represent the directions of maximum variance between variables in the data set, and are called the Principal Comp onents (PCs). The technique basically amounts to a rotation from the original axes to the new ones, and is therefore a linear transformation of the data. Figure 4.1 illustrates the concept with a two dimensional data set. The direction of maximum variance in the data is represented by u1 . This new axis (the first PC of the data set) b etter describ es the data than either x1 or x2 . The remaining variance in the data, once they have b een pro jected onto the first PC, is describ ed by u2 , the second PC. Thus, u1 and u2 are a more optimally aligned directional basis set for this particular data set. On the Automatic Analysis of Stellar Sp ectra

84

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra The PCs are derived in decreasing order of imp ortance, with the first PC describ-

ing most of the variance in the data, and subsequent PCs representing less and less information ab out the variance. In the case of a large N -dimensional data set, a successful derivation of the principal comp onents means that the first few comp onents can be used to give a compressed representation of the data without a significant loss of information. Lesser principal comp onents will typically contain information on features in the data which are not very well correlated, such as noise or anomalies. By discarding these comp onents, a compressed representation will preferentially remove undesired features, and features which do not vary over a sufficient fraction of the data set.

4.1.1

Mathematics of PCA

This presentation of PCA theory follows that of Bailer-Jones (1996) and Murtagh & Heck (1987). Let the vector x = (x1 ,x2 ,x3 ,... ,xN ) b e a stellar sp ectrum with N flux bins. A sp ectrum can then b e considered a p oint in N -dimensional space, with each axis representing each flux bin. M such sp ectra can b e describ ed as the (M в N ) matrix XT = (x1 , x2 ,... , xM ). The first principal comp onent is the normalised vector, u, which b est fits the p oints in XT . The criterion of goodness of fit of this axis to the p oint set is defined as the squared deviation of the p oints from the axis. Minimising the sum of distances b etween the p oints and axis is equivalent to maximising the sum of squared pro jections onto the axis, i.e., maximising the variance of the p oints when projected onto this axis. The sum of squared pro jections of the p oints in XT onto the new axis, u, is

(Xu)T (Xu).

(4.1)

4.1 Constructing A PCA-Based Filter

85

In maximising this quadratic form, the constraint must b e made that uT u = 1 otherwise the pro jection can b e maximised arbitrarily. Setting S = XT X, and introducing the Lagrange multiplier, , the maximum is obtained by differentiating

uT Su - (uT u - 1), which gives,

(4.2)

2Su - 2u. Setting this equal to zero, the optimal value of u is the solution of

(4.3)

Su = u.

(4.4)

This is a standard eigenvector problem. The eigenvector of S, u, is the line of b est fit, and the corresp onding eigenvalue, , indicates the amount of variance describ ed by this line. Calculating the remaining axes proceeds in a similar manner. The second axis is found by again maximising uT Su, but with the added constraint that the second axis be orthogonal to the first, i.e., uT u1 = 0. Introducing the Lagrange multipliers, 2 and 2 µ, the maximum is obtained by differentiating

uT Su2 - 2 (uT u2 - 1) - µ(uT u1 ), 2 2 2 giving,

(4.5)

2Su2 - 22 u2 - µu1 .

(4.6)

On the Automatic Analysis of Stellar Sp ectra

86

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra Setting this equal to zero, and multiplying through by uT yields 1

µuT u1 = 0, 1

(4.7)

which implies that µ = 0. Therefore, equation 4.6 is of the same form as equation 4.4, meaning 2 and u2 are the second largest eigenvalue and eigenvector of S. Thus, the principal comp onents of a set of N -dimensional p oints, X, are the eigenvectors of the matrix of sums of squares and cross products, S = XT X. There are N eigenvectors for an N -dimensional problem. The principal comp onents form a directional basis set, meaning that PCA is b est applied to data that are centred. Geometrically sp eaking, centring is equivalent to a shift in the origin of the co-ordinate system, and is p erformed by calculating and subtracting the mean from the row vectors of X. Let xi be the average of element xi over all M data p oints. Therefore, the ith element of the p
th

point is given by

xi,p = x

i,p

- xi .

(4.8)

S now b ecomes the covariance matrix of the data p oints. The result of equation 4.4 remains unchanged. Subtracting the mean also has the advantage that the dynamic range of S is reduced, increasing the numerical stability of the solution to the eigenvector problem.

4.1.2

Building A Hot Subdwarf Filter

By retaining only the most significant principal comp onents of an N -dimensional data set, a quick test can determine if a new data p oint is in a similar region of N -dimensional

4.1 Constructing A PCA-Based Filter

87

1.0

Normalised Flux

0.8

0.6

4100

4500 Wavelength (Angstroms)

4900

Figure 4.2: Mean sp ectrum of the Drilling et al. (2006) sample.

space as the original data set. This is the principle up on which a filter can b e built to help search for astronomical ob jects of a particular typ e from a large collection of unknown sp ectra. As describ ed at the start of the chapter, such a filter will now be developed using the collection of 177 standard hot sub dwarf sp ectra obtained from Drilling et al. (2006) (see also Chapter 2). The first step is to construct the mean sp ectrum, subtract it from each sp ectrum in the set, thereby forming the matrix of difference spectra using equation 4.8. The mean sp ectrum is plotted in Figure 4.2. The elements of the covariance matrix, S, are then calculated from

s

i,j

= xi,p xj,p .

(4.9)

On the Automatic Analysis of Stellar Sp ectra

88

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra The use of the covariance matrix in the formulation of PCA assumes that the data

do not need to b e standardised, i.e., that all the variables are on the same scale. This assumption is valid here b ecause the Drilling et al. (2006) spectra have all been continuum normalised, and the application of the filter will b e to normalised sp ectra. If the variables were on different scales, e.g., if the Drilling et al. (2006) set of sp ectra were unnormalised and half had flux scales several orders of magnitude greater than the other, then the large differences b etween the variances of the variables would cause weaker variables to b e ignored. Likewise, PCA can b e sensitive to outliers in the data set which can greatly contribute to the variance. Scale dep endences must b e removed if PCA is to generate useful comp onents. Common approaches to normalisation include standardising the variables to have unit variance, compressing them onto the scale 0-1, or taking logarithms. The results of the PCA will dep end on the normalisation method used. In this application of PCA to stellar sp ectra, the covariance matrix, S, will always be real and symmetric. As such, equation 4.4 does not need to solved as is b ecause any real matrix is diagonalised by the matrix of its eigenvectors (see Golub & Van Loan 1989). Any real and symmetric matrix can b e reliably diagonalised using a technique such as Jacobi's method. Here, a QR-based singular value decomp osition (see Press et al. 1986) routine has b een used to calculate the eigenvectors. The results of the PCA analysis are presented in Figures 4.3 and 4.4 wherein the first ten principal comp onents of the Drilling et al. (2006) sp ectra have b een plotted. The PCs are rotations in the data space of the original axes, therefore they resemble sp ectra, and have the same numb er of elements as the original sp ectra. It can b e clearly seen that the first PC differentiates b etween hydrogen and helium lines. This reification makes sense as it is these features which vary most across the Drilling et al. (2006) data set. The second PC also clearly differentiates b etween HeI and HeI I line series. For the

4.1 Constructing A PCA-Based Filter

89

4900

4900

4900

4900

4500

4500

4500

4500

4100

4100

4100

4100

-0.15

-0.15

-0.15

-0.15

Figure 4.3: First five PCs of the Drilling et al. (2006) sample.

On the Automatic Analysis of Stellar Sp ectra

-0.15 0.0 0.0 0.0 0.15 0.15 0.15 0.15 PC 4 0.0 PC 3 PC 2 PC 1

0.15

PC 0

0.0

4100

4500

4900

90

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra

4900

4900

4900

4900

4500

4500

4500

4500

4100

4100

4100

4100

-0.15

-0.15

-0.15

-0.15

Figure 4.4: Second five PCs of the Drilling et al. (2006) sample.

-0.15 0.15 0.15 0.15 0.0 0.0 0.0 PC 9 0.0 PC 8 PC 7

0.15

0.15

PC 6

PC 5

0.0

4100

4500

4900

4.1 Constructing A PCA-Based Filter
100

91

99

Cumulative Percentage of Total Variance

98

97

96

95

94

0

1

2

3

4

5

6

7

8

9

Principal Component

Figure 4.5: Cumulative variance of the first ten PCs of the Drilling et al. (2006) sample. remaining PCs, it b ecomes harder to attach any meaningful interpretation. The question remains as to how many principal comp onents should b e retained in order to form an adequate representation of the Drilling et al. (2006) standard stars. Figure 4.5 shows the cumulative p ercentage variance accounted for by the first ten principal comp onents. The first principal comp onent itself accounts for 94.66% of the total variance, which is not surprising given the reification outlined previously. All ten PCs account for 99.83% of the variance, however 99.30% is describ ed by the first four PCs, making them sufficiently adequate to give a compressed representation of the Drilling et al. (2006) hot standards. It should b e noted that this selection criterion of maximal variance may unwisely discard the less significant PCs. Lahav et al. (1996) p oint out that, in the role of On the Automatic Analysis of Stellar Sp ectra

92

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra

classification of galaxy sp ectra, the fractional variance on its own was not sufficient to determine how many PCs were needed for classification. The reason for this may b e due to non-linearity in the data (a sp ectrum is not a linear combination of line features, and the lines do not separate into different principal comp onents), the effect of noise on the deduction of the PCs, or the fact that classification requires more information than that given simply by the maximal variance. In the application of PCA here to the filtering of stellar sp ectra, only an adequate representation of a data set is sought through PCA, and not an adequate discrimination between classes within a data set. As such, the criterion of maximal variance remains valid. Now, let the matrix ET = (u1 , u2 , u3 , u4 )contain the first four principal comp onents of the Drilling et al. (2006) hot standards. To determine the similarity of some unknown sp ectrum y = (y1 ,y2 ,y3 ,... ,yN )to the Drilling etal. (2006) standards, first, the vector, p, is constructed which is the magnitudes of the pro jection of y onto each of the four principal comp onents in E,

p = y · E,

(4.10)

where y is the mean subtracted difference sp ectrum of y (i.e., y = y - x, where x is the mean sp ectrum of the Drilling et al. (2006)). The reduced reconstruction of y, yr , is then given by

yr = x + p · ET .

(4.11)

Figure 4.6 shows the results of pro jecting two hot sub dwarf spectra onto the first four principal comp onents of the Drilling et al. (2006) hot standards. At the top, sp ectrum A is a relatively good S/N observation of a cooler sub dwarf.

4.1 Constructing A PCA-Based Filter
1.5 A 1.89970 1.0

93

Original Spectrum Reduced Reconstruction

0.5

0.0 4100 1.5 B 6.22063 1.0 4500 Original Spectrum Reduced Reconstruction 4900

0.5

0.0 4100 4500 4900

Figure 4.6: Illustration of pro jecting hot sub dwarf sp ectra onto the first four PCs of the Drilling et al. (2006) standards. The original sp ectrum is plotted in red, and its reduced reconstruction in blue. Sp ectrum B shows a hotter sub dwarf with a lower S/N observation. Again, the original sp ectrum is plotted in red, with the reduced reconstruction in blue. Sp ectrum A compares well with its reduced reconstruction, the latter showing very little difference to the original. However, sp ectrum B is noiser, and its reduced reconstruction matches the sp ectrum well but for the noise (here, the noise-filtering capabilites of PCA can b e observed). Certainly, sp ectrum A, if encountered in a large set of unknown sp ectra, would b e desirable to the astronomer, whereas sp ectrum B could b e considered too noisy for any further analysis. Thus, when filtering through a large set of unknown sp ectra, those sp ectra which compare well with their reduced reconstructions will b e of most interest to the hot sub dwarf astronomer. On the Automatic Analysis of Stellar Sp ectra

94

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra A suitable quantitative measure for this comparison is the reconstruction error

R = 100 в

1 N

i=N i=1

(yi - yr,i )2 ,

(4.12)

where yi is the ith flux bin of the original sp ectrum, y, and y

r,i

is the ith flux bin

of the reduced reconstruction of y, yr . This error metric gives the RMS difference in each flux bin b etween the original sp ectrum and its reconstruction. The factor of 100 is simply a scaling factor to make the final error values easier to work with (it is anticipated that the ma jority of values for R will lie in the range 0 R 1). The reconstruction errors for each sp ectrum in Figure 4.6 are shown in the top left region of each plot. How "well" a sp ectrum should compare in this manner with its reduced reconstruction is a sub jective measure dep endent on the typ e of ob ject an astronomer is filtering for, and what further analysis he has in mind. In the hot sub dwarf case, for classification purp oses, a sp ectrum such as B in Figure 4.6 may mark the lower threshold of the reconstruction errors that are to b e accepted. However, if the derivation of physical parameters is the goal, then reconstruction errors close to that of sp ectrum A, but not as low as that of B, may b e desired. As mentioned in the introduction to this chapter, PCA is a data-driven tool, with the principal comp onents derived for one data set b eing unique to those data. As such, if, say, a galaxy sp ectrum is reconstructed using the PCs of the Drilling et al. (2006) standards, its reconstruction error will b e very high as it won't have many (if any) features in common with hot sub dwarfs. The same is true for noisy, or incomplete sp ectra, making them easy to filter out.

4.2 Searching the SDSS for Hot Sub dwarfs

95

4.2

Searching the SDSS for Hot Sub dwarfs

The PCA hot sub dwarf filter was applied to a sample of 4610 sp ectra obtained from the Sloan Digital Sky Survey, Data Release 3 database. The selection criteria used to obtain the sample from the SDSS are outlined in the following SQL query, SELECT s.plate, s.mjd,s.fiberid FROM BESTDR3..SpecPhotoAll as s WHERE s.specClass = dbo.fSpecClass('STAR') AND (s.primTarget & (dbo.fPrimTarget('TARGET_STAR_BHB') + dbo.fPrimTarget('TARGET_STAR_SUB_DWARF')) > 0) AND (s.objType = 2) The criteria naively rely up on the classifications automatically assigned by the SDSS sp ectrophotometic pip eline. The SDSS supplies sp ectra in FITS format with each FITS file including a calibrated sp ectrum, a normalised sp ectrum, and all measured parameters (redshift, line fits, line indices, p er-pixel resolution, etc.) stored in the FITS header. For convenience, the normalised sp ectra were extracted from the FITS files, and subsequently velocity corrected using the redshift stored in each FITS header. The sp ectra were then rebinned onto the common wavelength grid of 40504950° at a A disp ersion of 1° ixel-1 to match the Drilling et al. (2006) sp ectra. Ap The PCA filter was applied using equations 4.10 and 4.11, outlined in the previous section, to construct the set of reduced reconstructions. The reconstruction errors were then calculated as p er equation 4.12. The distribution of the reconstruction errors is displayed in Figure 4.7. The histogram shows that most of the sp ectra in the SDSS sample are concentrated in the region R 4.0. The contents of the first three error bins (R 1.8) are shown in Figures 4.8 and 4.9. Clearly, these eight sp ectra are of a good S/N, strong On the Automatic Analysis of Stellar Sp ectra

96
300

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra

250

Number of Spectra

200

150

100

50

0 <1.00 3.96 6.92 9.88 12.85 >15.00 Reconstruction Error - R

Figure 4.7: Histogram of reconstructions errors from the SDSS data sample.

sub dwarf candidates, and well-suited to further analysis. As the reconstruction error increases, the S/N of the sp ectra starts to decrease. Figure 4.10 shows four sp ectra sampled from the maximal error bin, R 3.0. They are slightly noiser sp ectra than those in Figures 4.8 and 4.9, but yet the reconstructions are still a close match, meaning they could still b e suitable for further analysis. By around R 4.5, the reconstruction quality is b ecoming noticably p oorer, as demonstrated in Figure 4.11. Here, the S/N is b ecoming progressively lower, and ob jects with sp ectra quite unlike those of sub dwarfs, such as white dwarfs, b egin to make an app earance in the succeding error bins. One interesting feature of note is the final error bin which contains all the SDSS sp ectra with reconstruction errors R > 15.0. It contains a large numb er of sp ectra in

4.2 Searching the SDSS for Hot Sub dwarfs

Figure 4.8: Sp ectra in first three reconstruction error histogram bins (R 3.0).

1.5 1.0 0.5 0.0

A 1.52992

J234137.25+000123.2

4100 1.5 1.0 0.5 0.0 4100 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 4100 D 1.74365 C 1.62708 B 1.66001

4500 J171531.67+271545.5

4900

On the Automatic Analysis of Stellar Sp ectra

4500 J155612.59+022152.9

4900

4500 J153701.88-011307.9

4900

97

4100

4500

4900

98

Figure 4.9: Sp ectra in first three reconstruction error histogram bins (R 3.0).

1.5 1.0 0.5 0.0

A 1.73210

J152357.12+354009.4

4100 1.5 1.0 0.5 0.0 4100 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 4100 4100 D 1.76950 C 1.71810 B 1.79120

4500 J151722.09+603546.3

4900

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra

4500 J125244.60-002512.9

4900

4500 J112015.43+650003.2

4900

4500

4900

4.2 Searching the SDSS for Hot Sub dwarfs

99

comparision to the preceding error bins. Such high reconstruction errors are indicative of sp ectra with features p oorly matched to typical sub dwarf sp ectra. Figure 4.13 shows a sample of four sp ectra from this error bin. The first three sp ectra are dominated by noise, with sp ectrum B exhibiting an anomalous gap in the data at around 4110° Sp ectrum D is incomplete, hence the A. large reconstruction error. The PCA filter is effective at separating out the very low S/N exemplars, and incomplete sp ectra as shown in Figure 4.13. However, it does not magically separate out sub dwarf candidates. Invariably, they will b e mixed in with stars that are very much sp ectroscopically similar to sub dwarfs. An example of high S/N sp ectra that aren't sub dwarfs, and which are filtered out, is shown in Figure 4.12. The SDSS sample used here was predominantly comp osed of cooler BHB and main sequence stars, with some white dwarfs. Thus, any sub dwarf candidates were difficult for the filter to extract from amidst the sp ectroscopically similar cooler stars. This problem was due to the search criteria used in the initial SQL query, but it can b e rectified by altering the database query to select by photometric colour which would exclude most of the cooler stars and sub dwarf-main sequence binaries. The reconstruction error calculation describ ed in equation 4.12 provides a description of the mean difference b etween an original sp ectrum and its reconstruction. As such, it served to rank the SDSS sp ectra mostly according to noise content. This meant that ob jects such as white dwarfs started to b e found ranked alongside lower S/N sub dwarf candidates/BHB stars with reconstruction errors of around R 7.0. This is not necessarily a problem per se because, by about R 5.0, any sub dwarfs to b e found are going to b e dominated by noise levels that may not b e conducive to useful further analysis. Practically sp eaking, the PCA filter allows a value of R to b e established b eyond which any sp ectra can b e safely discarded on the grounds that they are not of sufficient On the Automatic Analysis of Stellar Sp ectra

100

1.5 1.0 0.5 0.0
Figure 4.10: Sample of sp ectra from the eighth error bin (R 3.0).

A 2.97285

J023624.84-072238.1

4100 1.5 1.0 0.5 0.0 4100 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 4100 4100 D 2.88735 C 2.89146 B 3.00098

4500 J001832.61+155540.1

4900

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra

4500 J224640.34-090631.8

4900

4500 J145418.66-022346.1

4900

4500

4900

4.2 Searching the SDSS for Hot Sub dwarfs

1.5
Figure 4.11: Sample of sp ectra from the fourteenth error bin (R 4.5).

A 4.66907

1.0 0.5 0.0 4100 1.5 1.0 0.5 0.0 4100 1.5 1.0 0.5 0.0 1.5 1.0 0.5 4100 D 4.50072 4500 C 4.54534 4500 B 4.50669 4500

J001146.72+152147.5

4900 J165401.98+294801.7

On the Automatic Analysis of Stellar Sp ectra

4900 J074623.09+205546.7

4900 J113044.42+612111.7

101

0.0

4100

4500

4900

Figure 4.12: Sample of high S/N DA white dwarfs from the 22nd - 24th error bins (R 6.4 - 7.1)

102

1.5 1.0 0.5

A

6.48477

J085128.17+060551.2

4100 1.5 1.0 0.5 4100 1.5 1.0 0.5 4100 1.5 1.0 0.5 4100 D 7.03080 C 6.99275 B 6.94160

4500 J110651.79+625024.0

4900

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra

4500 J092252.13+524446.4

4900

4500 J080051.56+223558.5

4900

4500

4900

4.2 Searching the SDSS for Hot Sub dwarfs

A 15.15518
Figure 4.13: Sample of sp ectra from the fifty-third error bin (R > 15.0).

J075647.73+232913.6

1.5 1.0 0.5 0.0 4100 B 20.11230 1.5 1.0 0.5 0.0 4100 C 38.54495 1.5 1.0 0.5 0.0 4500 J140804.49+011320.1 4900 4500 J141509.80-021147.2 4900

On the Automatic Analysis of Stellar Sp ectra

4100 D 66.91276

4500 J145616.92+024549.6

4900

1.5 1.0 0.5 0.0

103

4100

4500

4900

104

Chapter 4 - On the Automatic Analysis of Stellar Sp ectra

S/N for whatever further analysis the astronomer has in mind. This will also safely discard ob jects whose sp ectra are not sufficiently similar to the ob jects of interest. For the sp ectra that remain, a visual insp ection is still necessary to separate out candidates of interest from ob jects for which the reconstruction error calculation is not sensitive enough to mark for removal. In the obtained SDSS sample, sp ectra with a reconstruction error of R < 5.0 were generally suitable for classification or parameterisation, however, as mentioned previously, any real sub dwarf candidates in that sub-sample were mixed in with cooler BHB and main-sequence stars.

4.3

Summary

The concept of the PCA-based filtering tool presented here is certainly sound from the point of necessity. In the construction of a filter for hot sub dwarfs, and its application to search for such stars in the SDSS, it was discovered that the SDSS-assigned sp ectral classifications are not a useful criterion to include in an initial search. The data set obtained was comp osed of a large quantity of blue horizontal branch stars. As they are sp ectroscopically very similar to hot sub dwarfs, this made it difficult for the filter to provide a robust discrimination b etween the two ob ject typ es. This p oint highlights the need to use appropriate and sp ecific search criteria when extracting data from a very large survey database. In the case of hot sub dwarfs and the SDSS, a photometric colour-based search would allow cooler BHB stars to b e avoided. Still, the PCA filter is not completely automated, and cannot be treated as a black box. A user must be aware of the correct manner of operation:

1. The set of training data from which a filter is to b e constructed must b e preprocessed into an homogeneous form. 2. Application data must b e pre-processed to have the same prop erties as the train-

4.3 Summary ing set (i.e., wavelength range, disp ersion, etc.).

105

3. An acceptable reconstruction error threshold is a sub jective decision that the user must make. It can only b e determined through examination of the filtering results, and prior exp erience. 4. A visual insp ection of data b elow the acceptable error threshold is still required to ensure the correct extraction of candidate ob jects from undesired but sp ectroscopically similar ob jects.

The diversity of real-world data makes decisive filtering a very hard problem, but the PCA filter presented here is able to reduce the search space by at least an order of magnitude, making the job of visual insp ection a lot more tractable.

On the Automatic Analysis of Stellar Sp ectra

Chapter 5

Application I - SDSS Hot Subdwarfs
Having established a set of tools in Chapters 2 to 4 for data mining large sets of astronomical sp ectra, they are now applied in unison to extract and analyse hot sub dwarf candidates from the Sloan Digital Sky Survey. Firstly, a set of search criteria based on SDSS photometric colours is devised to obtain a data set which excludes most of the horizontal branch stars encountered in the previous chapter. This data set is then filtered with the aid of the PCA filter, and pre-processed b efore b eing fed into the analysis pip eline for classification and parameterisation.

5.1

Search Criteria And Data Sets

After the work of Harris et al. (2003) and Kleinman et al. (2004) (based on the photometric simulations of Fan 1999), a search was made of the SDSS Data Release 3 database using the following selection criteria of SDSS ugr iz point spread function colour magnitudes, 107

108

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra SELECT s.plate, s.mjd,s.fiberid FROM BESTDR3..SpecPhotoAll as s WHERE s.psfMag_u < 21 AND (s.psfMag_u - s.psfMag_g) < 0.7 AND (s.psfMag_g - s.psfMag_r) < -0.1 AND s.specClass <> dbo.fSpecClass('QSO')

For completeness, the sp ectra chosen by the SDSS as their hot standards were also retrieved using a separate query, SELECT s.plate, s.mjd,s.fiberid FROM BESTDR3..SpecObj as s WHERE s.objType = dbo.fObjType('HOT_STD') The total data quantities retrieved by these two queries are summarised in Table 5.1. Data Set Colour-Colour Hot Standards Total Sp ectra Retrieved 6539 1411 7950 (6764 Unique)

Table 5.1: Summary of data quantities obtained from the SDSS DR3.

5.2

PCA Filtering

The PCA filter from Chapter 4 was applied to the 6764 unique sp ectra obtained from the SDSS. The SDSS-normalised sp ectrum was extracted from the each of the downloaded FITS files, and velocity corrected using the SDSS-derived redshift stored in each file's FITS header. The histogram of reconstruction errors is plotted in Figure 5.1. The large quantity of sp ectra located at the error bin R 2.46 are blank the normalised flux level is constant at 1.0 for all wavelengths. This is due to the rebinning

5.2 PCA Filtering
500 450 400 350 Number of Spectra 300 250 200 150 100 50 0 <1.00

109

8.00

15.00

22.00

29.00

>35.00

Reconstruction Error - R

Figure 5.1: Histogram of reconstruction errors for the colour-colour selected SDSS sample.

routine's default b ehaviour of assigning a flux value of 1.0 to those wavelengths where no flux information is available for interp olation. In this case, the sp ectra in question seem to originally cover a lower wavelength range than the chosen 40504950 ° A range. Otherwise, visual examination of the error bins reveals that all of the hot sub dwarf candidates of reasonable S/N are located b elow a reconstruction error level of R 6.4, and are mixed in with many white dwarf and blue horizontal branch sp ectra which are hard to separate out b ecause they often show almost no sp ectral features which allow the PCA filter to clearly distinguish them from hot sub dwarf candidates. At R > 6.4, the error bins are almost entirely comprised of various typ es of white dwarfs, with only a few very low S/N hot sub dwarf candidates. Selecting all those sp ectra with reconstruction errors R 6.4 yields 817 samples, approximately 400 of which are the "blank" sp ectra discussed previously. Removing On the Automatic Analysis of Stellar Sp ectra

110

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

them left a final set of 400 sp ectra which were manually processed to select the hot sub dwarf candidates from amidst the white dwarfs. This proceeded quickly as white dwarf sp ectra are quite distinct. A final data set of 282 hot sub dwarf candidates was obtained.

5.3

Analysis

The SDSS-normalised sp ectra are created by fitting a pseudo-continuum using a median/mean filter. A sliding window is created of length 300 pixels for stars, and a set of reference lines are used to mask out ma jor absorption features by excluding pixels closer than 8 pixels to any reference line. The remaining pixels are ordered, and the values b etween to 40th and 60th p ercentile are averaged to give the pseudocontinuum. However, this pseudo-continuum tends to underfit the real continuum for the higherorder Balmer lines, with blending b etween the broad wings pulling the pseudo-continuum down. Although the SDSS-normalised sp ectra are sufficient for the coarse filtering p erformed by the PCA filter, the underfitting associated with the pseudo-continuum makes them unsuitable for use in classification or parameterisation. Instead, the SDSS-calibrated sp ectra of the 282 hot sub dwarfcandidates were renormalised using an automated method based on cubic spline fitting, after having b een velocity corrected, again, using the SDSS redshifts. Each spectrum was then resampled onto the common wavelength grid of 40504950 ° at a sampling of 1 ° pixel-1 , ready A A for analysis by the classification neural network and SFIT. Physical parameters in Teff , log g, and log(nHe /nH ) were derived by fitting each sp ectrum to a large grid of 2426 LTE model sp ectra generated using STERNE and SPECTRUM. Details of the grid are summarised in Table 5.2.

5.4 Results Parameter Teff (kK) Values 8.0, 9.0, 10.0, 12.0, 14.0, 15.0, 16.0, 18.0, 20.0, 22.0, 24.0 25.0, 26.0, 28.0, 30.0, 32.0, 34.0, 35.0, 36.0, 38.0, 40.0, 45.0, 50.0 2.50, 3.00, 3.50, 4.00, 4.50, 5.00, 5.50, 6.00 0.001, 0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 0.9, 0.95, 0.99, 0.999

111

log g nHe

Table 5.2: The model grid used to obtain physical parameters from the SDSS hot sub dwarf candidates. .

5.4

Results

The results of b oth classification and parameterisation are presented in Figures 5.2-5.8, and tabulated in App endix B.

5.4.1

Parameterisation

A numb er of interesting features are present in the diagrams of Figure 5.2. Most prominent in the log gTeff plot is the low density region centred at Teff 22, 500K. Figure 5.4 overlays Figure 5.2 with density estimate contours which b etter illustrate the presence of the gap. This low density region app ears to separate the blue horizontal branch stars from the extended horizontal branch. However, it occurs at the same p osition as the zero-age main sequence, so could it b e the result of selection effects? The answer is probably no because an early B-type main sequence star with an apparent magnitude of mv = 15, similar to the stars in the hot sub dwarf sample, and an absolute magnitude of MV = -2.4, would b e located 30kp c away out of the plane of the galaxy. The existence of such a star at this p osition is unlikely. The same low density region was also observed by Green et al. (2006) and Saffer et al. (1994), and corresp onds with the second gap indentified in observations of blue halo stars by Newell (1973). On the Automatic Analysis of Stellar Sp ectra

112

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

2

3

4 log g ZAMS

5

ZAHB

6

He-MS

7

50000

40000

30000 Effective Temperature (K)

20000

10000

3

2

1

log( nHe / nH )

0

-1

-2

-3

-4

50000

40000

30000 Effective Temperature (K)

20000

10000

Figure 5.2: Parameterisation results of the 282 SDSS hot sub dwarf candidates. The helium main sequence of Paczynski (1971), and p ost-EHB evolutionary tracks of Dorґ man et al. (1993) are also plotted.

5.4 Results

113

1.0 0.8 0.6

sdO4VII:He26

50654, 6.001, -0.913

1.0 0.8 0.6

sdB1VI:He29

34502, 5.581, -0.568

1.0

0.5 sdB3VI:He2 0.0 25219, 5.303, -2.769

1.0

0.5

sdB7III:He2 0.0 4000 4200 4400 4600 Wavelength (Angstroms)

12653, 3.342, -3.004 4800 5000

Figure 5.3: Four example fits from the 282 SDSS hot sub dwarfs. The classification and physical parameters (Teff (K), log g, log(nHe /nH )) obtained for each star are printed in the lower corners of each plot. Heb er et al. (1984) and Newell (1973) prop ose evolutionary explanations for this gap based on variations in hydrogen envelop e mass along the horizontal branch, but this was b efore the discovery that p ossibly 2/3 of the sdB stars blueward of the gap are short-p eriod binaries (Maxted et al., 2001) (and therefore products of the common envelop e binary evolutionary channel). Monte Carlo simulations of single star evolution on the extended horizontal branch, carried out at St. Andrews (Jeffery & Jardine 1984, unpublished), did not reveal the existence of such a gap. It is therefore our hyp othesis that the second gap of On the Automatic Analysis of Stellar Sp ectra

114

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

2

3

4 log g ZAMS

5

ZAHB

6

He-MS

7

50000 3

40000

30000 Effective Temperature (K)

20000

10000

2

1

log( nHe / nH )

0

-1

-2

-3

-4 50000 40000 30000 Effective Temperature (K) 20000 10000

Figure 5.4: The results of applying a kernel density estimate analysis to the data from Figure 5.2. The low-density at Teff 22, 500K is prominent, along with another possible low-density region at Teff 41, 000K.

5.4 Results

115

Newell (1973) reflects differing evolutionary scenarios for blue horizontal branch stars and extended horizontal branch stars, primarily that sub dwarf B stars result from common-envelop e binary evolution. In the single star evolution hyp othesis, a strong stellar wind on the RGB is b elieved to occur, but which fails to remove the entire outer hydrogen envelop e b efore the helium core flash takes place. After the helium flash, a star evolves to the horizontal branch. The distribution of stellar masses along the horizontal branch must b e continuous b ecause evolutionary models do not predict gaps if the factors affecting mass loss in single stars (e.g., metallicity, rotation rate, magnetic field strength, etc.) are not discrete. In the binary star evolution scenario, most of the hydrogen-rich envelop e is removed (either by Roche Lob e overflow, or by a common envelop e phase) at the tip of the RGB, meaning that evolution proceeds to the blue end of the horizontal branch. The distribution of p ost-common envelop e binaries is not continuous b ecause a partial removal of the hydrogen envelop e does not occur. The second feature of interest in Figure 5.2 is the cluster of stars at Teff 44, 000K , log g = 5.7. The clump is also noticable in the log(nHe /nH )Teff plot in Figure 5.2 as the group of extremely helium rich stars at log(nHe /nH ) 1.2. Heb er et al. (2006), in a sp ectral analysis of sdO stars selected from the Sup ernova Ia Progenitor Survey, the Hamburg Quasar Survey, and the SDSS, show a similar clustering at the same location on their log gTeff diagram. The log(nHe /nH )Teff diagram in Figure 5.2 shows that the ma jority of the stars in the sample have helium deficient atmospheres (less than 0.5 times the solar abundance). This has b een attributed to diffusion and gravitational settling processes at work in the extended horizontal branch stars (Wesemael et al., 1982). For 28, 000K Teff 40, 000K , a correlation b etween helium abundance and Teff can b e seen, with the helium abundance increasing with temp erature. The same phenomenon was rep orted by Edelmann et al. (2003) in their analysis of sdBs from the

On the Automatic Analysis of Stellar Sp ectra

116

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

Hamburg Quasar Survey, and Saffer et al. (1994) in a study of 92 field sdBs drawn largely from the PG catalogue. Both studies also rep ort the existence of two sequences in the correlation, with a smaller fraction of stars having lower helium abundances at the same temp eratures than the bulk of the sdBs. There is evidence to suggest the existence of these two sequences in Figure 5.2. Heb er et al. (2006) also expand on this phenomenon by showing that the "cooler" sdO stars in their sample adhere to two distinct sequences, and extend the trend to higher Teff . The band of stars evident at log(nHe /nH ) = -3 corresp onds to the b oundary of the model grid used in the analysis.

5.4.2

Classification

The neural network classification results of the 282 hot sub dwarf candidates are shown in Figure 5.5. Although the neural network gives real-value outputs for each classification parameter, these have b een rounded to their closest value on the discrete Drilling et al. (2006) system to reflect how a human classifier would use the system. A correlation can b e seen b etween luminosity class and sp ectral typ e, with luminosity decreasing as sp ectral typ e progresses from O to A. As the physical analogues to luminosity and sp ectral typ e are log g and Teff resp ectively, this trend mirrors that found in the log gTeff plot of Figure 5.2. From the plot of helium class against sp ectral typ e, it can b e seen that the stars in the sample are either helium p oor or helium rich. There is a group of early-typ e sdBs showing a higher helium class than the bulk of such stars at the same sp ectral typ e. These are most likely the interesting subset hot sub dwarf stars known as He-sdBs (Jeffery et al., 1996; Ahmad, 2004). Figure 5.6 gives a comparison of the neural network classification results with the distribution of stars originally classified by Drilling et al. (2006) in their pap er. The

5.4 Results

117

0 I II Luminosity Class III IV V VI VII VIII IX O O5 B Spectral Type B5 A

40

30 Helium Class

20

10

0 O O5 B Spectral Type B5 A

Figure 5.5: Classification results of the 282 SDSS hot sub dwarf candidates. Points have b een given small random offsets in each axis for clarity.

On the Automatic Analysis of Stellar Sp ectra

118

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

A

B5

Spectral Type

B5 40 30 20 10 0 B5 Spectral Type 40 30 20 10 0 Helium Class O O5 Spectral Type B A O Helium Class O5 Spectral Type B

0 I II III IV V VI VII VIII IX

Figure 5.6: A comparison of the ANN classifications of the 282 SDSS hot subdwarf candidates (left-most plots) with all the stars classified by Drilling et al. (2006) (rightmost plots). Points have b een given small random offsets in each axis for clarity.

0 I II III IV V VI VII VIII IX

O

O5

B

B5

A

O

O5

B

A

Luminosity Class Luminosity Class

5.4 Results

119

50000 Effective Temperature (K)

40000

30000

20000

10000

O

O5

B Spectral Type

B5

A

7

6

5 log g 4 3 2 0

I

II

III

IV V VI Luminosity Class

VII

VIII

IX

3

2

log( nHe / nH )

1

0

-1

-2

-3 0 10 20 Helium Class 30 40

Figure 5.7: A calibration of the ANN classifications onto the Drilling et al. (2006) system using the 282 SDSS hot sub dwarf candidates. On the Automatic Analysis of Stellar Sp ectra

120

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

trends in the two distributions are similar if one takes into account the differing sample sizes. One feature of interest in the luminosity classsp ectral type plot of the Drilling et al. (2006) data is the group of high-luminosity B-typ e giant stars. These corresp ond with a group of MK stars used by Drilling et al. (2006) to interface their hot sub dwarf classification system with the MK system. In the corresp onding plot for the 282 SDSS hot sub dwarfs studied here, no such low luminosity class B-typ e stars are contained in the sample. A third-order calibration of the Drilling et al. (2006) classification system is shown in Figure 5.7 (i.e., the Drilling et al. (2006) parameters are b eing correlated to their corresp onding physical parameters using a sample of sp ectra that is not comprised of the original standard stars, and has not b een classified by Drilling et al. or any other human trained to use the Drilling et al. (2006) scale). Although a linear correlation can b e discerned b etween Teff vs. sp ectral typ e, and log(nHe /nH ) vs. helium class, the correlations are quite p oor. This could b e due to systematic noise introduced during the renormalisation of the SDSS data, and may also signify that the neural network is having difficulty interp olating in regions not well represented by the original Drilling et al. (2006) training data (Figure 2.1 shows two low-density regions around sp ectral typ es O5 and B5, which is where the most "confusion" is seen in the correlation of Figure 5.7). Despite the noise, the log(nHe /nH ) vs. helium class plot still follows the trend of Figure 14 of Drilling et al. (2006). Between log g and luminosity class, no significant correlation can b e seen. This is due to the ma jority of sub dwarfs residing in the luminosity classes VI and VI I, and between log g values of 5.0 and 6.0. The seemingly bi-modal distribution of this plot corresp onds to the separation b etween the lower-Teff , lower-log g BHB stars in the SDSS sample, and the higher-Teff , higher-log g sub dwarfs. It is imp ossible to constrain any

5.4 Results
25

121

20

Stars Per Bin

15

10

5

0 -600

-400

-200

0 Redshift (Km s-1)

200

400

600

Figure 5.8: The distribution of SDSS-derived redshifts of the 282 hot sub dwarf candidates.

linear fit to the distribution due to the under-representation of the lower-log g, higher luminosity class region. The concentration of p oints in luminosity classes VI and VI I reflect a similar pattern observed in Figure 15 of Drilling et al. (2006).

5.4.3

Radial Velocities

As an interesing aside, the radial velocities of the 282 hot sub dwarf candidates, as measured by the SDSS, are plotted in Figure 5.8. The errors in the radial velocities are of the order of 30kms
-1

. Several studies of the kinematical b ehaviour of hot sub dwarfs

have b een conducted in the past, e.g., Altmann et al. (2004), Maxted et al. (2001), de Boer et al. (1997), Colin et al. (1994). Altmann et al. (2004) p oint out that short-p eriod sdB binaries could exhibit orbital velocities in excess of 200kms
-1

, but with most b eing of the order of 50kms

-1

or less.

On the Automatic Analysis of Stellar Sp ectra

122

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

Based on the parameterisation and classification results of the hot sub dwarf sample studied here, it is clear that the ma jority of the sample are sdBs, and, consequently, possibly short-period binaries (see also Maxted et al. 2001). As the SDSS observes out of the galactic plane, most of the hot sub dwarf candidates will b e either thick disk, or halo ob jects with greater radial velocities due to their orbits not conforming with the local standard of rest (see Altmann et al. 2004). There are a few ob jects in the hot sub dwarf sample with velocities cz > ±400kms
-1

. Although

these velocities are unverified and could b e anomalous, they are greater than what can be accounted for by the previously outlined mechanisms. As such, they are of interest for further study (e.g., Hirsch et al. 2005).

5.5

Sources of Error

The results of this chapter are affected by a numb er of error sources. The issues of primary concern are systematic errors arising from the internal accuracy of the tools themselves, whether the training data for the tools are representative of the application domain, the assumptions used in generating the model sp ectra, and random errors in the application sp ectra along with systematic errors intro duced during the observation and reduction stage. In terms of the physical parameters derived using SFIT, SFIT produces standard errors for each parameter it fits based on the curvature of the 2 function in the region of parameter space ab out the located minimum. These errors give an indication of the internal accuracy of the fittin method, with the 2 function giving an indication of the goodness-of-fit. At the b oundaries of the grid, where the curvature is difficult to estimate, or in regions of low curvature, the standard errors may not b e as useful a measure of SFIT's internal uncertainty. A ma jor error source is the grid of theoretical models to which observations are fit. Here, models have b een used which assume a stellar atmosphere that is plane-parallel,

5.6 Analysis of PCA Filter Efficiency

123

and in local thermal, radiative and hydrostatic equilibrium. Opacities are modelled using opacity distribution functions, which differs fundamentally from the methods used in stellar atmospheres that do not make the LTE assumption. It is known that the LTE approximation is good up to 40, 000K, after which NLTE effects b ecome more significant. There is also the question of whether or not the inclusion of physical effects, such as magnetic fields, is an imp ortant issue. Within SFIT itself, the assumption is made that changes in the physical parameters of a model have a corresp onding linear effect on the flux distribution. It is known from theory that changes in the physical parameters have a nonlinear effect on the flux distribution, but a trade-off must b e made b etween accuracy and efficiency, exp ecially in a data mining context. Other sources of error, such as from the SDSS observation and reduction pip eline or the hot sub dwarf classification standards obtained from Drilling et al. (2006), are difficult to quantify. For the same reason, discussion of errors arising from models is a complicated topic and b eyond the scop e of this thesis. However, see, for example, Behara & Jeffery (2006) for an investigation of the influence of improving the opacities used in the models. Nevertheless, the issue of the robustness of the results presented in this chapter (and also the conclusions which are drawn from the results) is very imp ortant, but quantifying the influence of all the p ossible error sources requires further investigation.

5.6

Analysis of PCA Filter Efficiency

Figure 5.9 gives some examples of the BHB and white dwarf contaminants mentioned earlier in the chapter. In cases B, C, and D, the differences b etween the original sp ectrum and its reconstruction are not sufficient to produce a reconstruction error greater than the chosen threshold of 6.4. In case A, the BHB star, the reconstruction matches the original sp ectrum very closely, except for a slight difference in H. Physical On the Automatic Analysis of Stellar Sp ectra

124

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

parameters obtained for this star using SFIT show that it is too cool to b e a sub dwarf (Teff = 12, 000K, log g = 3.42,n
He

= 0.004).

The simple RMS error calculation of Equation 4.12 yields the scaled RMS difference b etween each flux p oint of the original sp ectrum and its PCA reconstruction. Clearly, then, for such small differences this error metric is not sensitive enough to filter out the BHB and white dwarf contaminants. This limitation of the PCA filter could b e dimished by further developing the reconstruction error calculation to include a weighting scheme that gives more significance to the sp ectral lines and features commonly found in the ob jects under investigation. A disadvantage to this approach is that a the weighting scheme must b e crafted and optimised manually to suit the quirks of the PCA filter and sp ectral features of the target ob jects. A more robust error metric that does not require user input is a topic for future work.

Quantitative Estimation of Filter Efficiency

To give an estimate of the success (and failure) of the PCA filter as deployed in this chapter, the word "success" needs to b e more clearly defined. Based on the results plotted in Figure 5.2, the assumption can b e made that most sub dwarfs in the SDSS sample lie, with good probability, in a region Teff 23, 000K, log g 4.7, as demonstrated in Figure 5.10. For any chosen value of R for the reconstruction error threshold, stars with a reconstruction error and parameters inside this region will b e assumed to b e true positives, i.e., actual sub dwarfs that the filter has successfully separated out. False positives are those stars which are within the value of R but lie outside this region, i.e., stars which the filter should have excluded but didn't. True negatives lie b oth outside the shaded region and b eyond the threshold of R. And, finally, false negatives lie within the shaded region but are outside of the filter's error threshold.

5.6 Analysis of PCA Filter Efficiency

125

4900

4900

4900

J220403.45+122507.3

J213301.41+122831.1

J135532.42+001124.0

4500

4500

4500

J101805.04+011123.5

3.34386

2.30372

4100

2.78036

4100

4100

3.90771

C

1.5

1.0

0.5

1.5

1.0

0.5

1.5

1.0

0.5

D

A

B

Figure 5.9: Examples of white dwarf and BHB contaminants. A - BHB star with deep Balmer lines. B - DA white dwarf with strong, broad Balmer lines due to high surface gravity. C - DB white dwarf. D - Uncertain (some evidence of weak carb on absorption, so p ossibly a DQ white dwarf ). On the Automatic Analysis of Stellar Sp ectra

1.0 0.9

4100

4500

4900

126
2

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

3

4 log g ZAMS 5 ZAHB

He-MS 6

7 50000 40000 30000 Effective Temperature (K) 20000 10000

Figure 5.10: This gray-shaded region of the log gTeff plane represents an area of good probability that the stars within it are sub dwarfs.

Using these definitions, the PCA filter's efficiency can b e quantitatively stated for any value of R. Of course, the assumption is that every star passing through the filter has values for Teff and log g. Estimates of these parameters for the SDSS sample were obtained by applying SFIT to the whole data set. The quantitative measures used are the p ercentage rate of true p ositives (which measures how successful the PCA filter is, according to the aforementioned definition of "success"),

TP Rate =

TP в 100% TP + FN

(5.1)

where TP is the numb er of true p ositives and FN the numb er of false negatives, and also the rate of false p ositives (which measures how often the filter fails),

5.6 Analysis of PCA Filter Efficiency
100

127

TP Rate FP Rate TP - FP

80

Percentage - %

60

40

20

0

0

10

20

30

40

50

Reconstruction Error - R

Figure 5.11: TP rates (red) and FP rates (blue) of the PCA filter as a function of the reconstruction error threshold, R. The green curve is the difference b etween the TP and FP rates.

FP Rate =

FP в 100% FP + TN

(5.2)

where FP is the numb er of false p ositives, and TN is the numb er of true negatives. Figure 5.11 shows how the TP and FP rates vary as a function of R in the application to the SDSS data set. The rate of true p ositives increases rapidly until R 10 after which it b egins to level off. The p ercentage of false p ositives increases slowly until R 5.5. From this p oint until R 13 the filter b egins to produce false p ositives at the maximum rate b efore starting to level off. At R 28, the rate of false p ositives surpasses that of true p ositives meaning that the filter now fails more than it succeeds. An idea of the optimum value for R can b e determined by plotting the difference between the rates of true positives and false positives for each R. This is the green curve in Figure 5.11. There is a noticeable and very definite p eak. Figure 5.12 shows a close On the Automatic Analysis of Stellar Sp ectra

128
100

Chapter 5 - On the Automatic Analysis of Stellar Sp ectra

TP Rate FP Rate TP - FP

80

Percentage - %

60

40

20

0 0 1 2 3 4 5 6 Reconstruction Error - R 7 8 9 10

Figure 5.12: A closer examination of the TP and FP rates. The p eak in the green TP-FP curve occurs at R 7.0 and signifies the optimum value for R in the SDSS sample.

up view of the region of this p eak, which occurs at R 7.0. At this error threshold, the PCA filter is producing the maximum numb er of true p ositives compared to false positives. In other words, this is the optimum value of R for this particular application. This compares favourably with the chosen reconstruction error threshold of R 6.4 rep orted in section 5.2. It should b e p ointed out that there does not seem to b e a reliable method for determining the optimal threshold value of R for a filter and data set, a priori, without first establishing at least a rough estimate of physical or classification parameters. If the PCA filter (which is fast in its op eration) was paired with a parameterising neural network or a fast nearest neighb our 2 fitting, then an estimate of the optimal PCA error threshold could b e obtained using the same method as ab ove.

5.7 Summary

129

5.7

Summary

The tools develop ed in Chapters 2 to 4 have b een deployed on a real-world data set with some interesting outcomes. The hot sub dwarf candidates extracted from the SDSS represent a completely homogeneous set, and their analysis evidences several unexplained phenomena:

1. Existence of the second horizontal branch gap of Newell (1973) at Teff 22, 500K. 2. Two sdB nHe Teff sequences, also observed by Edelmann et al. (2003). 3. A clustering of hot, helium rich sdO stars at Teff 44, 000K , log g = 5.7, also observed by Heb er et al. (2006).

These results reiterate the challenge to provide evolutionary explanations for the variety of stars present on the extended horizontal branch, andthe subsequent imp ortance of continuing research into hot sub dwarfs.

On the Automatic Analysis of Stellar Sp ectra

Chapter 6

Application I I - Other Data Sets
The work presented in this chapter details the application of the analysis pip eline to three smaller data sets obtained in collab oration with others in the field. This reflects the situation describ ed in Chapter 1 regarding the heterogenous data sets amassed by various ground-based observatories. When data from these observatories are made available, robust tools will b e needed to process them into a homogeneous form, and provide fast analyses.

6.1

2MASS-Selected Sample

A preliminary analysis of the 282 SDSS hot sub dwarf candidates in the previous chapter was presented at the Second Meeting on Hot Sub dwarfs and Related Ob jects in La Palma, June 2005. As a result of this conference, E. M. Green provided the author with a sample of high S/N, low-resolution sp ectra selected from 2MASS1 photometry (see Green et al. 2006) to b e classified and parameterised with the tools develop ed in this thesis. 83 2MASS-selected sp ectra were made available with an average S/N of ab out 133,
1

http://www.ipac.caltech.edu/2mass

131

132

Chapter 6 - On the Automatic Analysis of Stellar Sp ectra

but varying as high as 273 and as low as 70. The wavelength range covered is 3615 6900 ° at a resolution of R 922. A Sp ectra for two known stars, Balloon 090900004 and BD+48 2721, were also supplied along with physcial parameters (Teff , log g, log(nHe /nH )) obtained using NLTE model atmospheres (H+He, zero metal). The purp ose of these stars is to provide a temp erature calibration for the hot and cool ends of the sdOB sequence, so that the parameterisation results obtained with SFIT (and LTE model atmospheres) can b e compared with those derived from other model atmospheres. All of the sp ectra were previously flux and wavelength calibrated. Normalisation was carried out using a cubic spline fitting routine, and the spectra were then resampled onto a common wavelength grid of 40504950 ° at a sampling of 1 ° pixel-1 . Radial A A velocities were corrected for by cross correlating each sp ectrum with a grid of 101 theoretical models coarsely varying over Teff , log g, and log(nHe /nH ). During this pre-processing stage, it was discovered that twoof the stars in the sample were white dwarfs, so they were excluded from any further analysis. Application of the PCA filter of Chapter 4 was deemed unnecessary given the small sample size.

Analysis And Results

Classification and parameterisation on the final 83 stars was carried out using the classification neural network of Chapter 2, and SFIT using the same grid of models as in Chapter 5 (Table 5.2). Results are plotted in Figures 6.1 and 6.2, and tabulated in App endix C. The parameterisation results of the two calibration stars, Balloon 090900004 and BD+48 2721, are given in Table 6.1. Small differences exist b etween the parameters for both stars, with the hotter star, Balloon 090900004, showing a temp erature difference of 9700K. This is not unexp ected considering the inherent differences b etween the

6.1 2MASS-Selected Sample LTE and NLTE approaches. Identifier BD+48 2721 Teff (K) log g log(nHe /nH ) Teff (K) log g log(nHe /nH ) NLTE 23017 5.035 -2.135 40897 5.369 -2.842 LTE 22979 5.267 -1.629 31147 4.757 -1.811

133

Balloon 090900004

(248) (0.028) (0.022) (248) (0.022) (0.046)

(240) (0.032) (0.018) (278) (0.054) (0.056)

Table 6.1: Parameters of the two calibration stars as obtained by 2 -fitting to NLTE (Green et al., 2006) and LTE (Armagh) model atmospheres. Formal errors are given in parentheses. The parameterisation results of Figure 6.1 show distributions with some similarity to those of the SDSS hot sub dwarf candidates in Figure 5.2. The second gap of Newell (1973) seems to b e present at Teff 23, 000K (however, it is unsure if Green's sample suffers from any selection effects). Some main sequence late-typ e B and A stars app ear to b e present in the sample. The log(nHe /nH )Teff results in Figure 6.1 show the atmospheric helium deficiency of the sdB stars, and the cluster of blue horizontal branch stars with normal helium abundances. The main sequence stars present in the sample can b e seen again as the low temp erature, hydrogen-rich data p oints. Not enough sdB stars are present in the sample to confirm any correlation b etween helium abundance and Teff , although such a correlation app ears to b e suggested by the results. The distribution of classifications in Figure 6.2 again shows some similarity to that of the SDSS hot sub dwarf candidates in Figure 5.5. Not plotted in Figure 6.2 are the late-A and early-F sp ectral classifications assigned to some stars by the neural network. The parameterisation results suggest the existence of such stars in the sample, but it is of interest that the neural network would distinguish and assign them (unreliable) classes for which no samples were present in the training data. Figure 6.3 plots these stars. The deep and broad hydrogen Balmer lines corresp ond with the late-A and early-F sp ectral typ es. This would seem to demonstrate that the neural network has very good generalisation prop erties. On the Automatic Analysis of Stellar Sp ectra

134

Chapter 6 - On the Automatic Analysis of Stellar Sp ectra

2

3

4 log g

ZAMS

5 He-MS 6

ZAHB

7

50000

40000

30000

20000

10000

Effective Temperature (K)

2

1

log( nHe / nH )

0

-1

-2

-3

-4

50000

40000

30000

20000

10000

Effective Temperature (K)

Figure 6.1: SFIT physical parameters for 2MASS-selected sample. The helium main sequence of Paczynski (1971), and p ost-EHB evolutionary tracks of Dorman et al. ґ (1993) are also plotted.

6.1 2MASS-Selected Sample

135

0 I II Luminosity Class III IV V VI VII VIII IX O O5 B Spectral Type 40 B5 A

30 Helium Class

20

10

0 O O5 B Spectral Type B5 A

Figure 6.2: ANN classification for 2MASS-selected sample. Points have been given small random offsets in each axis for clarity. On the Automatic Analysis of Stellar Sp ectra

136

Chapter 6 - On the Automatic Analysis of Stellar Sp ectra

6

J143155.30+172404.9

sdA7V:He3

5.5 J095854.23+360314.3 sdF8VI:He2

5

4.5 Flux (continuum = 1) + const. J114454.50+031550.2 sdA7V:He4

4

3.5 J112832.64+603859.3 sdF5V:He3

3

2.5 J111819.13+093144.4 sdA2V:He5

2

1.5 J083127.37+422201.7 sdA5VI:He2

1

0.5

0

4100

4500 Wavelength (Angstroms)

4900

Figure 6.3: The stars assigned late-A and early-F sp ectral typ es by the neural network.

6.2 SDSS sdB-He Stars of Harris et al. (2003)

137

6.2

SDSS sdB-He Stars of Harris et al. (2003)

In collab oration with Ahmad (Ahmad et al., 2006) the classification neural network was used to classify a small set of "helium-rich" sdB-He stars obtained from the SDSS by Harris et al. (2003). Results of this analysis, along with helium abundances derived by Ahmad using SFIT and a grid of LTE model atmospheres, are presented in Table 6.2.

SDSS Identifier J094044.08+004759 J113840.69-003531 J124346.38+002534 J125410.86-010408 J131745.80+010450 J134545.24-000641 J134635.68-001804 J135707.35+010454 J141556.68-005814 J143917.64+010251 J144514.93+000249 J152708.31+003308 J152905.62+002137 J154238.43-003758

nHe 0.16 0.01 0.05 0.01 0.01 0.15 0.09 0.36 0.21 0.01 0.02 0.45 0.06 0.07

ANN Class sdB0VI I I:He23 sdB3V:He1 sdB1V:He23 sdB3I I I:He5 sdB0VI:He3 sdO9VI I:He21 sdA2IV:He0 sdO6VI I:He30 sdB8VI:He14 sdB6V:He3 sdB1VI I:He11 sdO9VI I I:He35 sdO9VI I:He10 sdA2I I I:He2

Table 6.2: Classification results for the sdB-He stars of Harris et al. (2003).

The aim of this work was to determine if the sdB-He stars of Harris et al. (2003) are similar to He-sdB stars (see Ahmad 2004) as this would increase the numb er of known helium-rich sub dwarfs for further study. However, it is clear from the classification and parameterisation results obtained that most of the sdB-He stars show very little helium enrichment, with half of the stars in the sample having surface gravities too low to b e sub dwarfs (Ahmad, private communication). Out of the remaining sub dwarfs, only a handful are helium rich (i.e. having n
He

0.10, or He class > 20). On the Automatic Analysis of Stellar Sp ectra

138

Chapter 6 - On the Automatic Analysis of Stellar Sp ectra

6.3

Ahmad & Jeffery (2003) He-sdBs

Ahmad & Jeffery (2003) undertook the first systematic study of a set of helium-rich sub dwarf B stars, obtaining observations and physical parameters for 17 targets. These stars have b een previously classified by Drilling et al.(2006) using observations from different sources. As such, the re-classification of these stars by the neural network in Chapter 2, using the new observations of Ahmad & Jeffery (2003), presents an opp ortunity to verify the neural network's p erformance. Ahmad & Jeffery (2003) observed the targets over a variety of wavelength ranges between 3900 and 5000 ° with the sp ectra b eing bias corrected, flat-fielded, sky subA, tracted, and wavelength calibrated using standard procedures. All sp ectra were normalised by defining a smooth p olynomial continuum from sections of local continuum, with care b eing taken to avoid the wings of broad absorption lines. Before passing the sp ectra to the neural network, they were rebinned onto the common wavelength grid of 40504950 ° at a sampling of 1 ° pixel-1 . Any wavelength bins A A in this grid for which no flux data were available in the original observations (i.e., in the case of a short sp ectrum) were automatically assigned a flux value of 1.0. The results are presented in Table 6.3, with a graphical comparison b etween the neural network classifications and those of Drilling et al. (2006) plotted in Figure 6.4. Although the sample is limited in distribution in the classification parameter space, a good agreement can b e seen b etween the neural network and Drilling et al. (2006), providing confirmation of the work presented in Chapter 2.

6.4

Summary

The application of the analytical tools develop ed in previous chapters to a collection of small data sets from different sources highlights their versatility and usefulness.

6.4 Summary

139

40

ANN Helium Class

30

20

10 10 20 30 Drilling Helium Class 40

IV

V ANN Luminosity Class

VI

VII

VIII

IX IX VIII VII VI Drilling Luminosity Class V IV

B5

ANN Spectral Type

B

O5 O5 B Drilling Spectral Type B5

Figure 6.4: Comparison of ANN classifications with those of Drilling et al. (2006) for the 17 He-sdBs of Ahmad & Jeffery (2003). Points have b een given small random offsets in each axis for clarity. Also plotted is the b est fit least squares regression line with error bars showing the RMS of the residuals. On the Automatic Analysis of Stellar Sp ectra

140

Chapter 6 - On the Automatic Analysis of Stellar Sp ectra Identifier HS1000+471 HS1844+637 LSIV-14 116 PG0229+064 PG0240+046 PG0902+057 PG1127+019 PG1415+492 PG1544+488 PG1554+408 PG1600+171 PG1615+413 PG1658+273 PG1715+273 PG2258+155 PG2321+214 TON107 Drilling Class sdBC0.2VI I:He28 sdB1VI I:He39 sdB0.2VI I:He17 sdB3V:He13 sdBC0.2VI I:He24 sdB0VI I:He38 sdOC9VI I:He40 sdBC1VI:He39 sdBC1VI I I:He39 sdB0.2VI I:He39 sdOC8.5VI I:He39 sdB1VI I:He37 sdOC9.5VI I:He39 sdB1VI I:He37 sdB0.2VI I:He39 sdB0VI I:He37 sdBC0.5VI I:He28 ANN Class sdB0VI I:He29 sdB2VI I:He37 sdB0VI I I:He20 sdB4V:He18 sdB2VI I:He28 sdO9VI I:He35 sdO8VI:He41 sdB0VI:He38 sdB0VI I:He37 sdB0VI I:He36 sdO8VI:He37 sdB2VI I:He34 sdO8VI I:He40 sdO5VI I I:He36 sdB1VI I:He35 sdB2VI I:He37 sdB1VI I:He27

Table 6.3: Classification results for the Ahmad & Jeffery (2003) He-sdBs. The results of the 2MASS-selected sample app ear to confirm the findings of Green et al. (2006), and lend supp ort to the results describ ed in the previous chapter. Before the evolutionary details causing the observed distributions can b e understood, additional data, e.g., stellar masses, needs to b e gathered. The application of the classification neural network to the helium-rich sub dwarf B stars of Harris et al. (2003) highlights the need for a homogeneous classification scheme for hot sub dwarfs.

Chapter 7

Conclusions And Future Work
This pro ject set out to examine the problem of analysing large sets of astronomical sp ectra. Sp ecifically, the intention was to establish a set of tools that can automatically extract and analyse the sp ectra of any typ e of ob ject from a large database of unknown observations, and then apply these tools to a real survey database. Analysing large sets of astronomical sp ectra consists of three core problems: classification, physical parameterisation, and the extraction of particular typ es of ob jects from an unknown data set. In this pro ject, classification was tackled by the highly versatile statistical machine learning method of artificial neural networks, which has seen widespread use in astronomy. Chapter 2 studied the use of ANNs to classify hot sub dwarf sp ectra onto the system defined by Drilling et al. (2006). Global errors (
rms

) on the classifications of

2 subtyp es for sp ectral typ e, 1 sub class for luminosity class, and 4 sub classes for the helium class were achieved. These errors are in line with the accuracies achieved by human classifiers. Physical parameters were obtained by fitting observations to grids of theoretical models using a 2 minimisation procedure. SFIT, the 2 minimisation code used at the Armagh Observatory, has b een improved in Chapter 3 using concepts from the domain 141

142

Chapter 7 - On the Automatic Analysis of Stellar Sp ectra

of computational geometry to provide a new methodology for storing and accessing arbitrarily large, three-dimensional grids of models, paving the way to extending the code to op erate in distributed parallel computing environments. Locating the sp ectra of a particular typ e of ob ject in a large set of unknown observations was accomplished using the multivariate statistical technique, Principal Comp onents Analysis. Chapter 4 outlined the mechanics of the filter, and demonstrated how it was used to extract hot sub dwarf sp ectra from a data set obtained from the SDSS. This solution provides a means to reduce unknown data sets to quantities suitable for closer visual insp ection. Collectively, these tools were applied to the archives of the SDSS to extract and analyse the sp ectra of hot sub dwarf stars. The PCA filter was able to reduce a set of almost 7000 unknown sp ectra to a collection of approximately 400 samples from which 282 hot sub dwarf candidates were quickly extracted by visual insp ection. The classification ANN successfully assigned classes to these stars based on the Drilling et al. (2006) system, and physical parameters were derived using SFIT and a grid of LTE model atmospheres. The results revealed several unexplained phenomena of extended horizontal branch stars, namely,

1. Existence of the second horizontal branch gap of Newell (1973) at Teff 22, 500K.

2. Two sdB nHe Teff sequences, also observed by Edelmann et al. (2003).

3. A clustering of hot, helium rich sdO stars at Teff 44, 000K , log g = 5.7, also observed by Heb er et al. (2006).

These findings p ose imp ortant questions for stellar evolution theory, and represent a successful demonstration of what this pro ject set out to achieve.

143

Future Directions

Working with the data from the SDSS highlighted a numb er of improvements that could be made to the tools themselves, but several important problems concerning sp ectral analysis and its large-scale application were also made apparent.

Continuum Normalisation

One of the most troubling was the normalisation of stellar continua. As noted in Chapter 5, the SDSS uses a method based on median/mean filtering which tends to underfit the continuum in regions where the blending of lines b ecomes very strong. An automatic renormalisation method based on cubic spline fitting was employed in Chapter 5 in an attempt to gain a more precise fit to the continuum. This method used several sets of pre-programmed wavelength locations ascontrol p oints for the cubic spline fit. The control p oints in each set were chosen manually by iterative refinement, and the different sets essentially conformed to a coarse temp eratureabundance classification system b ecause different control p oints were needed for hot, helium-rich stars and cooler, helium-p oor stars. Once the sets of control p oints were established, the method gave good results for the final set of hot sub dwarf candidates. Obviously, this particular methodology is p oorly catered for a general data mining application b ecause it is tied to one particular typ e of ob ject. quired. However, this is an extremely difficult problem b ecause such an algorithm must take into account many factors: noise, regions where the sp ectral flux changes rapidly, cosmic spikes and other anomalies, and troublesome regions like that of the higher-order Balmer lines where the actual continuum runs ab ove the flux information present. An acceptable solution will b e very hard to come by. On the Automatic Analysis of Stellar Sp ectra A more robust and general automatic algorithm is re-

144

Chapter 7 - On the Automatic Analysis of Stellar Sp ectra

Data Management

Another ma jor problem encountered was the management of large data sets. The two main issues are storing sets of sp ectra in a meaningful and easily accessible manner, and keeping track of the changes to each sp ectrum over the course of time. Almost 7000 unique sp ectra were extracted from the SDSS in Chapter 5. Over the course of the analysis, the sp ectra were converted from FITS files to ASCI I format, filtered, renormalised, velocity corrected, resampled, and collected together into the sp ecific formats required by the classification and parameterisation codes. Eventually, this trail of data b ecame cumb ersome to manage and keep track of as it was replicated into different folders and different files across the computer's file system. There was also an unfortunate incident where a badly typ ed command accidently deleted several very imp ortant folders of data. When the analysis of the 282 hot sub dwarfs was complete, the results were stored in several ASCI I-format files which had to b e processed manually in order to correlate the classifications of the ANN with the parameters found by SFIT. This led to several such files in different folders with no attached information to say when the results were obtained, from what data set, using which models, and which ANN. Both of these issues highlight the need for a centralised database which can keep track of the changes made to the data as an analysis proceeds. Such an idea is already widely used in tools to help manage computer software pro jects (e.g., CVS1 ). These tools record all the changes made to each individual source file, allowing the changes to be rolled back to any previous version should something go awry. Auditing analyses of astronomical sp ectra in this manner would bring with it not only data integrity, but a trail of op erations conducted on the data which could b e analysed in detail later should an erroneous methodology need verified. A centralised database would also allow structured metadata to b e recorded con1

http://www.nongnu.org/cvs/

145 cerning the dates and times of analyses, the tools used and their version numb ers, the theoretical models used, the date they were generated, and the codes and atomic data used to generate them, and so on. Such metadata would prove invaluable if, for example, an analysis is revised at a later date. Finally, storing results alongside the data in a homogeneous database would greatly simplify tasks such as producing plots for publication, applying clustering algorithms to automatically look for patterns in the results, and cross-correlating the database with other databases accessible over the internet.

Data Visualisation

When dealing with large quantities of data, one extremely useful tool is interactive visualisation. Being able to graphically represent data in useful ways, and manipulate them by way of visualisation, facilitates the process of discovery and understanding. When analysing the SDSS data in Chapter 5, the final hot sub dwarf sample was manually selected from the PCA filtering results. This stage would have proceeded much more quickly if a good visualisation tool had b een in place. In this pro ject, extensive use was made of Gnuplot2 to visualise sp ectra. Although Gnuplot is an excellent plotting tool, it is not designed for interactive investigation of the data b eing plotted. As such, to visualise the SDSS data, Gnuplot was invoked from a script to produce thousands of plots that were subsequently displayed in a series of static web pages. Clearly, this is awkward, adding another layer of data management to complicate the problems mentioned previously. A b etter solution is desp erately needed.

2

http://www.gnuplot.info/

On the Automatic Analysis of Stellar Sp ectra

146

Chapter 7 - On the Automatic Analysis of Stellar Sp ectra

Algorithm Development

Working with the main analytical tools used in this pro ject showed that they could be improved in several ways. The errors obtained for the classifications produced by the neural network in Chapter 2 are global estimates based on the leave-one-out cross-validation that was carried out. It would b e far more useful if prop er confidence intervals were available for each individual result produced by the ANN. Such confidence intervals can b e obtained through the bootstrap statistical technique (e.g., Willemsen et al., 2005), or Bayesian methods (see Bishop, 1995, sect. 10.2). The SFIT model grid indexing and searching methodology in Chapter 3 works well for two and three-dimensional grids. Although it was stipulated in that chapter that higher dimensional grids are not likely to b e used due to the curse of dimensionality, the use of four or p ossibly five-dimensional grids may not b e out ofthe question as computer technology continues to improve. In theory, the Delaunay triangulation methodology in Chapter 3 could b e extended to higher dimensional geometries, but a different approach (p erhaps the k-D tree-based algorithm discussed in the chapter) may b e more flexible and less complicated. As it stands, SFIT, with the modifications of Chapter 3, is a flexible and robust tool for sp ectral parameterisation. The next step forward is to introduce parallel programming constructs to allow its use in a distributed computing environment, such as the computing cluster at the Armagh Observatory (see App endix D), or the Grid. Programatically sp eaking, this is not a very difficult task, but it does require some planning. The Principal Comp onents Analysis filtering tool of Chapter 4 worked well for the application to hot sub dwarf sp ectra. A visual selection process is still required on the final filtered data set b ecause precise filtering is a hard problem. Nevertheless, future work could help improve the efficiency of the PCA filter p erhaps by devising a new reconstruction error calculation that is more sensitive to the finer details of

147 astronomical sp ectra. In the application of the hot sub dwarf filter to the data sets obtained from the SDSS in Chapters 4 and 5, the filter could have worked b etter if more weighting was given to differences in the cores and wings of sp ectral lines. This would burden the user with supplying some sort of line list giving the wavelengths and p erhaps equivalent widths of sp ectral lines to which the error calculation should pay attention, but a little effort sp ent in preparation could save a lot of time when it comes to the visual insp ection stage. The tools used in this pro ject were chosen based on their previous successful applications to analysing astronomical sp ectra, but many other machine learning techniques have the p otential to b e employed (see Russell & Norvig, 2003). Algorithms such as the Kohonen self-organising map (Kohonen, 1990; Kohonen et al., 1996), and Bayesian probabilistic methods like those emb odied in the AutoClass program3 , can take an unknown dataset and automatically derive classes for that set based on the information present in the data. This makes them of particular interest for filtering and classification problems, and it would b e a worthwhile pro ject to investigate their ability in this regard.

Afterword
As noted in Chapter 1, improvements in observational and information technology mean that the amount of data b eing gathered in astronomy is always increasing. The sp ecific result of this thesis is a set of tools which can b e used to analyse the very large databases that will b e generated by new survey pro jects such as SDSS-I I and the GAIA space mission. The ultimate future goal of the work presented in this thesis is, however, to continue the development of the computational framework of Jeffery (2003). This framework
3

http://ic.arc.nasa.gov/ic/pro jects/bayes-group/autoclass/

On the Automatic Analysis of Stellar Sp ectra

148

Chapter 7 - On the Automatic Analysis of Stellar Sp ectra

incorp orates the tool set develop ed here into a much wider system to analyse and manage astronomical data, making use of distributed computing initiatives such as the Grid (see Figure 7.1). This system will help us set sail on the seas of astronomical data, charting our way into the unknown mysteries of the universe.

Figure 7.1: Schematic diagram showing how the work of this thesis fits in with the wider system envisaged by Jeffery (2003).

Unknown Data Set (eg SDSS)

Pre-Processing

Training Data

PCA Filtering

Distributed Computing Resources

Third-Party Codes

On the Automatic Analysis of Stellar Sp ectra

ANN Classification

Manual Selection

2 Model Fitting

Theoretical Models Database

Model Generation

Request New Data

Results Database

Parameter Space Exploration

Remote Atomic Database

R-Matrix II Calculation

Remote Astronomical Databases (eg Simbad)

Results Exploration & Visualisation

149

151

Bibliography
Ahmad, A. 2004, PhD thesis, The Queen's University of Belfast Ahmad, A. & Jeffery, C. S. 2003, A&A, 402, 335 Ahmad, A., Winter, C., & Jeffery, C. S. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2, The Proceedings of the 2nd Meeting on Hot Sub dwarfs and Related Ob jects, ed. R. H. ьstensen, 159162 Allende Prieto, C., Reb olo, R., Lґp ez, R. J. G., Serra-Ricart, M., Beers, T. C., Rossi, o S., Bonifacio, P., & Molaro, P. 2000, AJ, 120, 1516 Altmann, M., Edelmann, H., & de Boer, K. S. 2004, A&A, 414, 181 Bailer-Jones, C. A. L. 1996, PhD thesis, University of Cambridge --. 1997, PASP, 109, 932 Bailer-Jones, C. A. L., Irwin, M., Gilmore, G., & von Hipp el, T. 1997, MNRAS, 292, 157 Bailer-Jones, C. A. L., Irwin, M., & von Hipp el, T. 1998, MNRAS, 298, 361 Behara, N. T. & Jeffery, C. S. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2, The Proceedings of the 2nd Meeting on Hot Sub dwarfs and Related Objects, ed. R. H. ьstensen, 115122 Bishop, C. M. 1995, Neural Networks for Pattern Recognition (Oxford: Oxford University Press) Brown, T. M., Bowers, C. W., Kimble, R. A., & Ferguson, H. C. 2000, ApJ, 529, L89 Brown, T. M., Ferguson, H. C., Davidsen, A. F., & Dorman, B. 1997, ApJ, 482, 685 Caloi, V. 1976, A&A, 50, 471 --. 1989, A&A, 221, 27 Colin, J., de Boer, K. S., Dauphole, B., Ducourant, C., Dulou, M. R., Geffert, M., Le Campion, J.-F., Moehler, S., Odenkirchen, M., Schmidt, J. H. K., & Theissen, A. 1994, A&A, 287, 38 153

154 Colless, M., Dalton, G., Maddox, S., Sutherland, W., Hawthorn, J., Bridges, T., Cannon, R., Collins, C., K., De Propris, R., Driver, S. P., Efstathiou, G., Ellis, K., Jackson, C., Lahav, O., Lewis, I., Lumsden, S., Peterson, B. A., Price, I., Seab orne, M., & Taylor, K Connolly, A. J. & Szalay, A. S. 1999, AJ, 117, 2052

BIBLIOGRAPHY Norb erg, P., Cole, S., BlandCouch, W., Cross, N., Deeley, R. S., Frenk, C. S., Glazebrook, Madgwick, D., Peacock, J. A., . 2001, MNRAS, 328, 1039

Connolly, A. J., Szalay, A. S., Bershady, M. A., Kinney, A. L., & Calzetti, D. 1995, AJ, 110, 1071 D'Cruz, N. L., Dorman, B., Rood, R. T., & O'Connell, R. W. 1996, ApJ, 466, 359 de Boer, K. S., Aguilar Sanchez, Y., Altmann, M., Geffert, M., Odenkirchen, M., Schmidt, J. H. K., & Colin, J. 1997, A&A, 327, 577 Deeming, T. J. 1964, MNRAS, 127, 493 Djorgovski, S. G., Gal, R. R., Odewahn, S. C., de Carvalho, R. R., Brunner, R., Longo, G., & Scaramella, R. 1998, in Wide Field Surveys in Cosmology, 14th IAP meeting held May 26-30, 1998, Paris. Publisher: Editions Frontieres. ISBN: 2-8 6332-241-9, p. 89., ed. S. Colombi, Y. Mellier, & B. Raban, 89+ Dorman, B., Rood, R. T., & O'Connell, R. W. 1993, ApJ, 419, 596 Dreizler, S., Heb er, U., Werner, K., Moehler, S., & de Boer, K. S. 1990, A&A, 235, 234 Drilling, J. S. 1996, in ASP Conf. Ser. 96: Hydrogen Deficient Stars, 461 Drilling, J. S., Jeffery, C. S., Moehler, S., Heb er, U., & Napiwotzki, R. 2006, in preparation Dudley, R., E. 1992, PhD thesis, The University of St. Andrews Edelmann, H., Heb er, U., Hagen, H.-J., Lemke, M., Dreizler, S., Napiwotzki, R., & Engels, D. 2003, A&A, 400, 939 Edelsbrunner, H. & Shah, N. R. 1992, in SCG '92: Proceedings of the eighth annual symp osium on Computational geometry (New York, NY, USA: ACM Press), 4352 Fan, X. 1999, AJ, 117, 2528 Folkes, S. R., Lahav, O., & Maddox, S. J. 1996, MNRAS, 283, 651 Francis, P. J., Hewett, P. C., Foltz, C. B., & Chaffee, F. H. 1992, ApJ, 398, 476 Galaz, G. & de Lapparent, V. 1998, A&A, 332, 459 Glazebrook, K., Offer, A. R., & Deeley, K. 1998, ApJ, 492, 98 Golub, G. H. & Van Loan, C. F. 1989, Matrix Computations, 2nd edn. (Baltimore, Maryland 21218: The Johns Hopkins University Press)

BIBLIOGRAPHY

155

Green, E. M., Fontaine, G., Hyde, E. A., Charpinet, S., & Chayer, P. 2006, in Baltic Astronomy, Vol. 15, Nos. 1-2, The Proceedings of the 2nd Meeting on Hot Sub dwarfs and Related Ob jects, ed. R. H. ьstensen, 167174 Green, R. F., Schmidt, M., & Lieb ert, J. 1986, ApJS, 61, 305 Greenstein, J. L. & Sargent, A. I. 1974, ApJS, 28, 157 Gulati, R., Gupta, R., & Singh, H. 1997a, PASP, 109, 843 Gulati, R. K., Gupta, R., Gothoskar, P., & Khobragade, S. 1994a, ApJ, 426, 340 --. 1994b, Vistas in Astronomy, 38, 293 --. 1996, Bulletin of the Astronomical Society of India, 24, 21 Gulati, R. K., Gupta, R., & Rao, N. K. 1997b, A&A, 322, 933 Harris, H. C., Lieb ert, J., Kleinman, S. J., Nitta, A., Anderson, S. F., Knapp, G. R., Krzesinski, J., Schmidt, G., Strauss, M. A., Vanden Berk, D., Eisenstein, D., Hawley, ґ S., Margon, B., Munn, J. A., Silvestri, N. M., Smith, J. A., Szkody, P., Collinge, M. J., Dahn, C. C., Fan, X., Hall, P. B., Schneider, D. P., Brinkmann, J., Burles, S., Gunn, J. E., Hennessy, G. S., Hindsley, R., Iveziґ, Z., Kent, S., Lamb, D. Q., c Lupton, R. H., Nichol, R. C., Pier, J. R., Schlegel, D. J., SubbaRao, M., Uomoto, A., Yanny, B., & York, D. G. 2003, AJ, 126, 1023 Heb er, U. 1986, A&A, 155, 33 Heb er, U. 1991, in IAU Symp. 145: Evolution of Stars: the Photospheric Abundance Connection, ed. G. Michaud & A. V. Tutukov, 363+ Heb er, U., Hirsch, H., StrЁer, A., O'Toole, S., Haas, S., & Dreizler, S. 2006, in Baltic o Astronomy, Vol. 15, Nos. 1-2, The Proceedings of the 2nd Meeting on Hot Sub dwarfs and Related Ob jects, ed. R. H. ьstensen, 104111 Heb er, U. & Hunger, K. 1987, in IAU Colloq. 95: Second Conference on Faint Blue Stars, ed. A. G. D. Philip, D. S. Hayes, & J. W. Lieb ert, 599602 Heb er, U., Hunger, K., Jonas, G., & Kudritzki, R. P. 1984, A&A, 130, 119 Hirsch, H. A., Heb er, U., O'Toole, S. J., & Bresolin, F. 2005, A&A, 444, L61 Husfeld, D., Butler, K., Heb er, U., & Drilling, J. S. 1989, A&A, 222, 150 Hutchison, R. B. 1971, AJ, 76, 711 Ib en, I., Kaler, J. B., Truran, J. W., & Renzini, A. 1983, ApJ, 264, 605 Ib en, I. J. 1990, ApJ, 353, 215 Jeffery, C. S. 2003, in ASP Conf. Ser. 288: Stellar Atmosphere Modeling, ed. I. Hub eny, D. Mihalas, & K. Werner, 141+ On the Automatic Analysis of Stellar Sp ectra

156

BIBLIOGRAPHY

Jeffery, C. S., Drilling, J. S., Harrison, P. M., Heb er, U., & Moehler, S. 1997, A&AS, 125, 501 Jeffery, C. S., Heb er, U., Hill, P. W., Dreizler, S., Drilling, J. S., Lawson, W. A., Leuenhagen, U., & Werner, K. 1996, in ASP Conf. Ser. 96: Hydrogen Deficient Stars, ed. C. S. Jeffery & U. Heb er, 471+ Jeffery, C. S., Woolf, V. M., & Pollacco, D. L. 2001, A&A, 376, 497 Katz, D., Soubiran, C., Cayrel, R., Adda, M., & Cautain, R. 1998, A&A, 338, 151 Kleinman, S. J., Harris, H. C., Eisenstein, D. J., Lieb ert, J., Nitta, A., Krzesinski, J., ґ Munn, J. A., Dahn, C. C., Hawley, S. L., Pier, J. R., Schmidt, G., Silvestri, N. M., Smith, J. A., Szkody, P., Strauss, M. A., Knapp, G. R., Collinge, M. J., Mukadam, A. S., Koester, D., Uomoto, A., Schlegel, D. J., Anderson, S. F., Brinkmann, J., Lamb, D. Q., Schneider, D. P., & York, D. G. 2004, ApJ, 607, 426 Klemola, A. R. 1961, ApJ, 134, 130 Kohonen, T. 1990, in New Concepts in Computer Science: Proc. Symp. in Honour of Jean-Claude Simon (Paris, France: AFCET), 181190 Kohonen, T., Hynninen, J., Kangas, J., & Laaksonen, J. 1996, SOM PAK: The SelfOrganizing Map program package, Tech. Rep. A31, Lab oratory of Computer and Information Science, Helsinki University of Technology Kurtz, M. J. 1982, Ph.D. Thesis Lahav, O., Naim, A., Sodrґ, L., & Storrie-Lombardi, M. C. 1996, MNRAS, 283, 207 e Lamy, H. & Hutsemґkers, D. 2004, A&A, 427, 107 e Lasala, J. 1994, in ASP Conf. Ser. 60: The MK Process at 50 Years: A Powerful Tool for Astrophysical Insight, ed. C. J. Corbally, R. O. Gray, & R. F. Garrison, 312+ Levenb erg, K. 1944, Questions of Applied Mathematics, 2, 164 Livny, M. & Raman, R. 1998, in The Grid: Blueprint for a New Computing Infrastructure, ed. I. Foster & C. Kesselman (Morgan Kaufmann) Marquardt, D. W. 1963, Journal of the Society for Industrial and Applied Mathematics, 11, 431 Maxted, P. f. L., Heb er, U., Marsh, T. R., & North, R. C. 2001, MNRAS, 326, 1391 Mengel, J. G., Norris, J., & Gross, P. G. 1976, ApJ, 204, 488 Moehler, S., de Boer, K. S., & Heb er, U. 1990a, A&A, 239, 265 Moehler, S., Richtler, T., de Boer, K. S., Dettmar, R. J., & Heber, U. 1990b, A&AS, 86, 53 MЁller, T. & Trumb ore, B. 1997, Journal of Graphics Tools, 2, 21, see: o http://www.acm.org/jgt/pap ers/MollerTrumb ore97/

BIBLIOGRAPHY

157

Moore, A. 1991, A tutorial on kd-trees, Extract from PhD Thesis, available from http://www.cs.cmu.edu/awm/pap ers.html Morgan, W. W., Abt, H. A., & Tapscott, J. W. 1978, Revised MK Spectral Atlas for stars earlier than the sun (Williams Bay: Yerkes Observatory, and Tucson: Kitt Peak National Observatory, 1978) Morossi, C. & Crivellari, L. 1980, A&AS, 41, 299 MЁcke, E. P., Saias, I., & Zhu, B. 1996, in SCG '96: Proceedings of the twelfth annual u symp osium on Computational geometry (New York, NY, USA: ACM Press), 274283 Murtagh, F. & Heck, A. 1987, Multivariate Data Analysis (Dordrecht, Holland: D. Reidel Publishing Co.) Napiwotzki, R., Karl, C. A., Lisker, T., Heb er, U., Christlieb, N., Reimers, D., Nelemans, G., & Homeier, D. 2004, Ap&SS, 291, 321 Newell, E. B. 1973, ApJS, 26, 37 O'Rourke, J. 1998, Computational Geometry in C, 2nd edn. (Cambridge (UK) and New York: Cambridge University Press) Paczynski, B. 1971, Acta Astronomica, 21, 1 ґ Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. 1986, Numerical Recip es: The Art of Scientific Computing, 1st edn. (Cambridge (UK) and New York: Cambridge University Press) Qin, D.-M., Guo, P., Hu, Z.-Y., & Zhao, Y.-H. 2003, Chinese Journal of Astronony and Astrophysics, 3, 277 Reid, I. N., Brewer, C., Brucato, R. J., McKinley, W. R., Maury, A., Mendenhall, D., Mould, J. R., Mueller, J., Neugebauer, G., Phinney, J., Sargent, W. L. W., Schomb ert, J., & Thicksten, R. 1991, PASP, 103, 661 Renka, R. J. 1988, ACM Trans. Math. Softw., 14, 139 Rhee, J., Beers, T. C., & Irwin, M. J. 1999, Bulletin of the American Astronomical Society, 31, 971 Russell, S. & Norvig, P. 2003, Artificial Intelligence A Modern Approach, 2nd edn. (Upp er Saddle River, New Jersey 07458: Pearson Education Inc.) Saffer, R. A., Bergeron, P., Koester, D., & Lieb ert, J. 1994, ApJ, 432, 351 Shepard, D. 1968, in Proceedings of the 1968 23rd ACM national conference (New York, NY, USA: ACM Press), 517524 Shewchuk, J. R. 1996, in SCG '96: Proceedings of the twelfth annual symp osium on Computational geometry (New York, NY, USA: ACM Press), 141150 Simkin, S. M. 1974, A&A, 31, 129 On the Automatic Analysis of Stellar Sp ectra

158

BIBLIOGRAPHY

Singh, H. P., Gulati, R. K., & Gupta, R. 1998, MNRAS, 295, 312 Skrutskie, M. F., Cutri, R. M., Stiening, R., Weinb erg, M. D., Schneider, S., Carp enter, J. M., Beichman, C., Capps, R., Chester, T., Elias, J., Huchra, J., Lieb ert, J., Lonsdale, C., Monet, D. G., Price, S., Seitzer, P., Jarrett, T., Kirkpatrick, J. D., Gizis, J. E., Howard, E., Evans, T., Fowler, J., Fullmer, L., Hurt, R., Light, R., Kopan, E. L., Marsh, K. A., McCallon, H. L., Tam, R., Van Dyk, S., & Wheelock, S. 2006, AJ, 131, 1163 Snider, S., Allende Prieto, C., von Hipp el, T., Beers, T. C., Sneden, C., Qu, Y., & Rossi, S. 2001, ApJ, 562, 528 Sodre, L. J., Cuevas, H., & Cap elato, H. V. 1998, in Wide Field Surveys in Cosmology, 14th IAP meeting held May 26-30, 1998, Paris. Publisher: Editions Frontieres. ISBN: 2-8 6332-241-9, p. 424., ed. S. Colombi, Y. Mellier, & B. Raban, 424+ Storrie-Lombardi, M. C., Irwin, M. J., von Hipp el, T., & Storrie-Lombardi, L. J. 1994, Vistas in Astronomy, 38, 331 Sweigart, A. V. 1997, ApJ, 474, L23+ Theissen, A., Moehler, S., Heb er, U., & de Boer, K. S. 1993, A&A, 273, 524 Thejll, P., Bauer, F., Saffer, R., Lieb ert, J., Kunze, D., & Shipman, H. L. 1994, ApJ, 433, 819 Tonry, J. & Davis, M. 1979, AJ, 84, 1511 von Hipp el, T., Storrie-Lombardi, L. J., Storrie-Lombardi, M. C., & Irwin, M. J. 1994, MNRAS, 269, 97 Weaver, W. B. 2000a, Bulletin of the American Astronomical Society, 32, 1430 --. 2000b, ApJ, 541, 298 Weaver, W. B. & Torres-Dodgen, A. V. 1995, ApJ, 446, 300 --. 1997, ApJ, 487, 847 Weir, N., Fayyad, U. M., Djorgovski, S. G., & Roden, J. 1995, PASP, 107, 1243 Wesemael, F., Winget, D. E., Cab ot, W., van Horn, H. M., & Fontaine, G. 1982, ApJ, 254, 221 Whitney, C. A. 1983, A&AS, 51, 443 Willemsen, P. G., Hilker, M., Kayser, A., & Bailer-Jones, C. A. L. 2005, A&A, 436, 379 York, D. G., Adelman, J., Anderson, J. E., Anderson, Bakken, J. A., Barkhouser, R., Bastian, S., Berman, Briegel, C., Briggs, J. W., Brinkmann, J., Brunner, M. A., Castander, F. J., Chen, B., Colestock, P. L., S. F., Annis, J., Bahcall, N. A., E., Boroski, W. N., Bracker, S., R., Burles, S., Carey, L., Carr, Connolly, A. J., Crocker, J. H.,

BIBLIOGRAPHY

159

Csabai, I., Czarapata, P. C., Davis, J. E., Doi, M., Domb eck, T., Eisenstein, D., Ellman, N., Elms, B. R., Evans, M. L., Fan, X., Federwitz, G. R., Fiscelli, L., Friedman, S., Frieman, J. A., Fukugita, M., Gillespie, B., Gunn, J. E., Gurbani, V. K., de Haas, E., Haldeman, M., Harris, F. H., Hayes, J., Heckman, T. M., Hennessy, G. S., Hindsley, R. B., Holm, S., Holmgren, D. J., Huang, C.-h., Hull, C., Husby, D., Ichikawa, S.-I., Ichikawa, T., Iveziґ, Z., Kent, S., Kim, R. S. J., Kinney, E., Klaene, c M., Kleinman, A. N., Kleinman, S., Knapp, G. R., Korienek, J., Kron, R. G., Kunszt, P. Z., Lamb, D. Q., Lee, B., Leger, R. F., Limmongkol, S., Lindenmeyer, C., Long, D. C., Loomis, C., Loveday, J., Lucinio, R., Lupton, R. H., MacKinnon, B., Mannery, E. J., Mantsch, P. M., Margon, B., McGehee, P., McKay, T. A., Meiksin, A., Merelli, A., Monet, D. G., Munn, J. A., Narayanan, V. K., Nash, T., Neilsen, E., Neswold, R., Newb erg, H. J., Nichol, R. C., Nicinski, T., Nonino, M., Okada, N., Okamura, S., Ostriker, J. P., Owen, R., Pauls, A. G., Peoples, J., Peterson, R. L., Petravick, D., Pier, J. R., Pop e, A., Pordes, R., Prosapio, A., Rechenmacher, R., Quinn, T. R., Richards, G. T., Richmond, M. W., Rivetta, C. H., Rockosi, C. M., Ruthmansdorfer, K., Sandford, D., Schlegel, D. J., Schneider, D. P., Sekiguchi, M., Sergey, G., Shimasaku, K., Siegmund, W. A., Smee, S., Smith, J. A., Snedden, S., Stone, R., Stoughton, C., Strauss, M. A., Stubbs, C., SubbaRao, M., Szalay, A. S., Szapudi, I., Szokoly, G. P., Thakar, A. R., Tremonti, C., Tucker, D. L., Uomoto, A., Vanden Berk, D., Vogeley, M. S., Waddell, P., Wang, S.-i., Watanab e, M., Weinb erg, D. H., Yanny, B., & Yasuda, N. 2000, AJ, 120, 1579

On the Automatic Analysis of Stellar Sp ectra

Appendices

161

App endix A

Results for 192 Drilling et al. (2006) Hot Subdwarfs
This table lists the parameterisation results for b oth the calibrated and uncalibrated stars obtained from Drilling et al. (2006). Results obtained from the parameterisation neural network and SFIT are given, with the internal errors of SFIT also listed.

163

164

Table A.1: Parameterisation Results for 192 Drilling et al. (2006) Hot Subdwarfs

SFIT Results Identifier Teff (K) BD-07 3477 BD+25 3941 BD+28 4211 BD+40 4032 Feige 110 Feige 15 Feige 38 Feige 56 Feige 98 FHB 18 FHB 23 HD 144941 HD 160641 HD 17520 HD 184279 HD 192281 HD 217086 Hiltner 600 HR 6092 HR 6588 27748 28478 48135 27895 40000 12000 29629 15571 11590 10819 11646 22000 30614 34793 25927 11337 37427 29739 22693 23305 1362 447 1120 713 196 162 504 279 196 133 189 348 207 723 292 194 439 564 545 466 Teff log g (cgs) 5.420 4.645 5.773 4.083 5.776 4.500 5.546 3.608 3.793 4.179 4.394 3.835 3.153 3.804 3.917 4.012 4.691 4.717 4.026 3.682 0.120 0.058 0.072 0.069 0.042 0.053 0.055 0.048 0.064 0.037 0.053 0.055 0.025 0.067 0.034 0.044 0.065 0.063 0.070 0.063 -2.673 -1.422 -1.121 -1.079 -2.020 -2.201 -2.483 -1.765 -2.541 -1.602 -2.586 0.963 2.237 -0.661 -0.400 -1.856 -1.237 -1.112 -1.477 -0.938 0.204 0.034 0.040 0.036 0.136 0.276 0.132 0.101 0.302 0.052 0.334 0.008 0.225 0.030 0.016 0.405 0.045 0.034 0.039 0.033 1.90E-01 1.89E+00 3.72E-01 1.02E+00 8.78E-01 2.48E+00 2.71E-01 2.80E-01 7.42E-01 5.28E-01 4.27E-01 1.31E+00 5.06E+00 1.46E+00 2.06E+00 6.63E-01 8.02E-01 6.06E-01 4.36E+00 3.60E+00 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 26360.4620 30794.9018 59518.9743 27197.0607 45638.2315 13763.3319 30014.7462 17780.0476 13083.8300 12901.1081 14271.8636 21681.7225 31303.9914 35330.3819 28409.0447 11143.6105 45007.2314 27869.8814 14259.7656 16634.1650

log g (cgs)

log(n

He

/nH )

Chapter A - On the Automatic Analysis of Stellar Sp ectra

5.4370 4.7807 6.6219 3.7571 5.9658 3.9102 5.5768 3.7579 3.8179 4.2528 4.4317 3.6821 2.8086 4.0278 3.9086 3.6825 5.1943 4.7985 2.7584 2.5556

-2.4641 -2.2476 -3.3647 -0.9165 -3.3709 -1.4225 -2.5487 -2.3262 -3.0230 -2.7523 -3.3414 1.6263 1.8212 -0.9810 -0.0132 -2.2308 -1.5241 -0.9717 -0.0691 -0.1572

continued on next page

Table A.1: continued

SFIT Results Identifier Teff (K) HR 6719 HR 7287 HR 8622 HS 0016+0044 HS 1000+471 HS 1844+637 HS 2253+0900 HS 2301+0728 HZ 15 HZ 44 LSIV-14 LSIV-6 LSS 5121 PG0001+275 PG0004+133 PG0009+036 PG0039+049 PG0039+135 PG0057+155 PG0101+039 PG0133+114 25813 19088 28951 29725 40766 29045 13534 17658 20434 38507 37999 28974 30511 35180 26205 20214 28606 45029 32203 27565 35999 814 391 1186 540 137 177 281 528 584 224 240 257 245 300 1118 629 391 212 375 1344 61 Teff log g (cgs) 3.500 3.598 2.919 5.523 5.659 3.633 3.863 4.378 3.000 5.381 5.648 3.747 3.216 5.406 4.828 4.496 4.668 5.408 5.500 5.357 6.000 0.056 0.058 0.063 0.075 0.036 0.026 0.060 0.065 0.065 0.040 0.045 0.046 0.037 0.053 0.110 0.084 0.068 0.081 0.063 0.114 0.034 -0.597 -1.929 0.107 -2.939 0.521 0.976 -1.156 -2.771 -0.630 0.088 -0.573 2.522 2.273 -3.000 -2.037 -2.488 -2.995 0.514 -1.785 -3.000 -2.995 0.026 0.074 0.006 0.377 0.013 0.008 0.062 0.257 0.025 0.003 0.017 0.144 0.163 0.434 0.094 0.134 0.430 0.030 0.053 0.434 0.430 2.68E+00 4.50E+00 3.06E+00 6.74E-01 6.30E+00 8.46E+00 5.37E-01 2.29E+00 2.37E+00 1.58E+00 3.63E+00 2.73E+00 3.95E+00 4.31E+00 3.67E+00 2.29E+00 2.26E+00 3.51E+00 1.02E+00 1.66E+00 1.54E+00 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 19240.4897 12007.6482 29216.7530 30591.2646 35501.0360 39133.7152 13688.8067 10788.7216 24751.3086 37595.6085 37185.7816 26745.5011 32165.1540 34205.9951 32994.6787 23334.3874 20927.6745 42789.5834 33110.1795 27132.9594 30996.2889

log g (cgs) 2.3722 2.1732 2.8812 5.7893 4.3467 4.5950 3.9192 2.7773 3.0214 4.8985 5.3742 2.7521 3.2400 5.0684 5.2114 4.8444 3.7393 5.3039 5.5769 5.1346 5.0018

log(n

He

/nH )

-0.0260 0.1142 0.2961 -3.2113 -0.6256 3.4902 -1.8474 -3.8880 -1.0013 -0.1747 -0.8631 3.2416 1.0310 -4.2459 -1.8659 -2.5336 -3.3758 -0.3035 -2.0610 -2.8721 -3.0707

On the Automatic Analysis of Stellar Sp ectra

165

continued on next page

166

Table A.1: continued

SFIT Results Identifier Teff (K) PG0135+242 PG0142+148 PG0208+016 PG0229+064 PG0232+095 PG0304+183 PG0314+146 PG0342+026 PG0838+133 PG0856+121 PG0902+058 PG0907+123 PG0909+164 PG0909+275 PG0918+029 PG0920+029 PG0921+161 PG0921+311 PG0934+145 PG0954+049 PG1000+375 25308 28738 44506 18305 35000 29953 8541 21878 40055 26869 42352 26482 31880 34009 31029 25992 32868 42320 16681 13384 32896 323 565 160 192 435 710 37 819 139 620 111 641 562 427 310 1329 292 137 212 239 237 Teff log g (cgs) 3.375 5.022 5.926 3.991 4.861 5.182 1.235 4.731 4.500 5.600 6.000 5.075 4.847 4.795 5.500 4.781 5.329 5.873 4.031 3.399 5.814 0.057 0.080 0.041 0.056 0.030 0.077 0.028 0.073 0.031 0.067 0.041 0.071 0.086 0.057 0.066 0.111 0.060 0.047 0.037 0.058 0.053 2.452 -2.966 1.218 -0.798 -1.392 -2.913 -2.004 -3.000 0.990 -3.000 1.912 -3.000 -3.000 -0.685 -2.649 -3.000 -1.640 1.068 -0.899 -1.526 -1.761 0.123 0.402 0.043 0.035 0.053 0.356 0.088 0.434 0.013 0.434 0.106 0.434 0.434 0.022 0.193 0.434 0.057 0.015 0.034 0.087 0.050 3.30E+00 3.72E+00 4.17E+00 9.77E-01 5.88E+00 7.50E+00 7.27E+00 2.06E+00 3.52E+00 2.70E+00 3.85E+00 4.06E+00 1.57E+00 1.93E+00 2.41E+00 1.48E+00 2.59E+00 2.72E+00 2.22E+00 1.31E+00 1.31E+00 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 22720.6398 33990.4595 42334.8969 23366.1468 26042.8910 25550.8542 4603.7402 30227.8532 46872.5237 27571.6464 42180.4296 24278.8101 35575.3699 36507.4485 24129.8781 27712.3799 35284.4531 41033.2928 13314.2430 12465.6053 20945.0047

log g (cgs) 2.5661 5.0653 5.7844 4.5001 3.8106 4.7584 1.2050 5.3591 4.9328 5.7342 5.7766 4.9274 4.6247 5.1157 4.4794 4.8930 5.1844 5.5140 3.7870 3.0812 4.8115

log(n

He

/nH )

2.3728

Chapter A - On the Automatic Analysis of Stellar Sp ectra

-3.3974 0.7455 -1.1572 -2.5768 -3.9436 -3.8688 -3.3190 4.0786 -3.3096 1.9591 -3.3162 -3.7231 -1.2373 -2.2902 -3.3238 -3.0325 0.5536 -0.7299 -2.6841 -1.6003

continued on next page

Table A.1: continued

SFIT Results Identifier Teff (K) PG1017+431 PG1018-047 PG1047+003 PG1049+013 PG1050-065 PG1118+061 PG1127+019 PG1136-003 PG1154-070 PG1220-056 PG1230+067 PG1245-042 PG1246-122 PG1249+762 PG1255+547 PG1258-030 PG1300+279 PG1303-114 PG1325+054 PG1336-018 PG1343-102 32192 30361 32977 32754 34509 28321 40812 30576 28000 49308 38843 15232 32573 50000 32774 13075 48677 31245 44232 31271 29958 454 252 318 427 236 386 131 260 963 883 191 266 361 276 330 278 632 354 208 245 707 Teff log g (cgs) 4.816 5.365 5.459 4.725 5.591 5.224 4.965 5.250 5.430 5.460 4.926 3.803 4.001 5.623 5.500 3.656 5.955 5.502 5.915 5.567 5.424 0.074 0.057 0.066 0.069 0.047 0.065 0.070 0.056 0.092 0.107 0.056 0.056 0.044 0.087 0.059 0.048 0.053 0.068 0.033 0.049 0.087 -2.991 -3.000 -2.130 -2.610 -1.212 -3.000 1.940 -3.000 -2.200 0.309 1.013 -1.820 -1.012 0.368 -1.547 -2.286 -0.041 -3.000 0.326 -2.519 -2.910 0.425 0.434 0.117 0.177 0.035 0.434 0.265 0.434 0.069 0.013 0.013 0.144 0.035 0.028 0.046 0.168 0.002 0.434 0.011 0.143 0.353 5.45E-01 2.23E+00 4.07E-01 8.38E-01 2.91E+00 3.84E-01 1.81E+00 4.32E-01 3.92E-01 8.09E-01 2.36E+00 5.97E-01 3.11E+00 1.06E+00 1.20E+00 1.03E+00 1.56E+00 1.82E+00 2.18E+00 6.11E-01 6.71E-01 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 44773.4090 30145.1825 34846.5851 44195.8859 36241.7231 27695.5481 39675.7493 28147.6630 26478.4624 52185.9827 40314.1570 15548.7313 33475.4066 65439.9717 30935.1727 12502.8404 46441.3479 30588.0996 45019.5994 37061.5891 31263.5069

log g (cgs) 5.8294 5.3885 5.5663 5.5828 5.3524 5.1031 4.6297 5.0249 5.2617 6.0800 5.0314 3.8416 3.7321 6.7602 5.6242 3.3526 6.5777 5.4478 6.0397 5.9196 5.3800

log(n

He

/nH )

-3.6347 -5.4213 -2.5655 -2.9385 -1.2791 -2.9965 2.1534 -3.0350 -2.1939 -1.0282 2.8373 -1.4344 -2.2059 0.0994 -2.1342 -2.0501 -0.9764 -5.0836 1.9269 -3.4263 -3.4850

On the Automatic Analysis of Stellar Sp ectra

167

continued on next page

168

Table A.1: continued

SFIT Results Identifier Teff (K) PG1343+578 PG1348+607 PG1352-023 PG1355-064 PG1401+289 PG1409-103 PG1413+114 PG1415+492 PG1426-067 PG1432+004 PG1433+239 PG1441+407 PG1448-052 PG1449+652 PG1451+492 PG1453-081 PG1453-085 PG1458+423 PG1506-052 PG1510+635 PG1518-098 15335 45000 47661 49999 47629 39399 43416 31467 34262 24561 35306 49802 32489 30456 17996 16393 12264 29151 38002 12489 47134 301 223 1197 733 247 768 117 283 380 1051 378 277 304 269 482 281 169 811 479 184 1376 Teff log g (cgs) 3.396 5.386 6.000 5.299 5.753 5.008 6.000 4.135 5.368 4.987 5.345 6.000 5.189 4.598 3.999 3.977 3.175 5.000 5.227 3.256 5.000 0.062 0.076 0.091 0.115 0.087 0.089 0.039 0.047 0.066 0.114 0.061 0.089 0.058 0.057 0.067 0.068 0.044 0.104 0.071 0.048 0.098 -1.835 0.000 -1.723 0.364 0.184 -2.166 1.261 2.512 -3.000 -2.308 -2.970 0.464 -3.000 -3.000 -1.919 -1.196 -2.777 -2.995 -2.226 -2.906 0.276 0.089 0.000 0.092 0.029 0.014 0.127 0.063 0.141 0.434 0.088 0.405 0.038 0.434 0.434 0.108 0.095 0.260 0.430 0.073 0.350 0.029 7.67E-01 2.17E+00 2.46E+00 9.95E-01 9.22E-01 1.45E+00 3.54E+00 2.26E+00 1.17E+00 9.80E-01 3.89E+00 1.61E+00 7.68E-01 1.23E+00 8.85E-01 1.02E+00 6.93E-01 6.94E-01 2.32E+00 2.53E+00 1.04E+00 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 14789.0812 56671.0493 54114.0712 55817.7736 43645.2377 50621.5726 44247.9760 30378.3273 34904.3068 24852.4845 39665.7300 46391.6786 50271.3965 22392.8613 21898.4813 20030.3017 14759.0551 29104.1751 57963.7346 14137.8470 63938.6641

log g (cgs) 3.2773 6.6681 6.2103 5.5139 5.6074 5.7496 5.7971 3.7367 5.1176 5.0079 5.3344 5.5498 6.1452 3.6074 4.1696 3.9852 3.3091 4.8454 6.0973 3.2197 5.7489

log(n

He

/nH )

-2.8106

Chapter A - On the Automatic Analysis of Stellar Sp ectra

2.0785 -3.7261 -1.3330 0.1812 -4.4764 3.0220 2.7468 -2.8789 -2.5775 -3.5664 0.7923 -3.8949 -4.2284 -2.0822 -1.7400 -3.1086 -3.9124 -6.7294 -3.4824 2.8699

continued on next page

Table A.1: continued

SFIT Results Identifier Teff (K) PG1519+640 PG1526+440 PG1532+523 PG1534-018 PG1536+690 PG1537-046 PG1538+401 PG1538+611 PG1543+629 PG1544+488 PG1544+601 PG1545+035 PG1549+006 PG1553-077 PG1554+408 PG1558-007 PG1559+048 PG1559+222 PG1559+533 PG1600+171 PG1602+013 28801 41199 30618 45010 50000 47494 32260 30528 40002 30992 30000 38820 31939 44904 34356 21419 36325 42434 29420 45883 39961 669 139 263 166 277 676 320 307 185 273 518 419 559 210 255 753 228 138 633 363 202 Teff log g (cgs) 5.000 5.826 5.257 5.656 5.647 5.249 5.446 5.416 5.536 4.202 5.543 5.000 5.288 5.707 4.320 4.922 5.600 6.000 5.500 5.999 5.592 0.099 0.037 0.056 0.043 0.087 0.108 0.064 0.063 0.043 0.046 0.054 0.083 0.070 0.045 0.056 0.081 0.049 0.047 0.090 0.092 0.042 -2.779 0.738 -2.935 1.224 0.364 0.174 -3.000 -3.000 -2.137 2.522 -2.579 -1.081 -2.074 0.256 2.522 -2.709 -0.938 2.129 -2.817 0.943 -1.995 0.261 0.023 0.374 0.087 0.028 0.008 0.434 0.434 0.119 0.144 0.165 0.036 0.103 0.014 0.289 0.222 0.033 0.117 0.285 0.026 0.129 8.55E-01 3.90E+00 1.03E+00 8.31E-01 9.38E-01 1.17E+00 8.57E-01 1.32E+00 1.07E+00 1.50E+00 4.27E-01 1.43E+00 1.08E+00 1.56E+00 3.41E+00 4.26E-01 1.05E+00 2.79E+00 1.52E+00 3.97E+00 1.49E+00 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 29147.1760 37868.3371 25389.6717 43114.3697 55913.0919 58302.0139 34944.9166 29648.9555 51462.0631 33677.6132 28454.0634 47852.7894 29109.5245 42387.5197 38902.7912 26843.3693 36003.1956 41856.4568 21995.1194 42817.5697 53840.6098

log g (cgs) 4.9799 4.9471 4.5495 5.3541 6.1882 6.0350 5.5433 5.0240 5.9720 4.4517 5.5260 5.5488 5.0910 5.3536 4.9593 5.3108 5.3491 5.5616 5.0622 5.4414 6.1985

log(n

He

/nH )

-3.1609 -0.1320 -3.1041 0.8758 -0.0670 2.8671 -2.9674 -4.0399 -3.9767 3.5425 -2.6698 -4.0558 -1.8754 1.0981 2.1864 -2.7541 -0.9926 1.0422 -1.9979 1.6103 -3.8367

On the Automatic Analysis of Stellar Sp ectra

169

continued on next page

170

Table A.1: continued

SFIT Results Identifier Teff (K) PG1605+072 PG1607+174 PG1610+519 PG1613+467 PG1615+413 PG1618+563 PG1619+525 PG1624+085 PG1627+006 PG1627+017 PG1629+466 PG1640+645 PG1644+404 PG1645+610 PG1646+607 PG1648+315 PG1648+536 PG1653+633 PG1656+600 PG1658+273 PG1701+359 30000 32181 31595 23590 28553 33309 31178 43897 23222 22959 35779 34458 28221 28377 47595 41835 30145 34667 30481 43059 32615 824 335 380 1075 189 341 318 205 593 522 273 262 457 522 668 130 630 268 252 113 88 Teff log g (cgs) 4.779 4.674 4.537 4.612 3.759 5.481 5.500 5.752 5.193 5.206 5.000 5.591 5.060 5.411 6.000 6.000 5.001 5.790 6.000 6.000 5.918 0.088 0.046 0.070 0.083 0.018 0.069 0.067 0.058 0.068 0.065 0.035 0.048 0.062 0.080 0.057 0.046 0.093 0.055 0.056 0.038 0.021 -2.790 -0.268 -3.000 -2.541 0.990 -1.513 -2.371 0.623 -2.899 -2.700 0.767 -1.590 -3.000 -2.391 -0.032 1.040 -3.000 -1.693 -2.628 1.261 -2.604 0.268 0.009 0.434 0.151 0.008 0.042 0.102 0.045 0.344 0.218 0.030 0.051 0.434 0.107 0.001 0.014 0.434 0.043 0.184 0.063 0.175 2.97E-01 5.28E+00 5.74E-01 9.68E-01 3.70E+00 2.17E+00 4.17E-01 1.94E+00 1.01E+00 3.61E-01 1.21E+00 4.84E-01 7.48E-01 8.12E-01 2.31E+00 4.08E+00 7.59E-01 1.01E+00 1.51E+00 3.25E+00 5.87E-01 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 31978.9966 31842.6743 38165.4762 29674.0969 36907.9108 36466.0770 30888.0423 41755.1142 20611.6950 22424.9747 37328.4040 35698.5308 32326.3238 17587.7044 46365.3539 36749.1785 31819.6176 38094.6200 33912.5833 40736.4516 32564.5119

log g (cgs) 4.9655 4.4473 4.8614 5.1593 4.9056 5.8061 5.4437 5.5719 4.8277 5.2100 5.1599 5.9373 5.4986 4.7845 6.0318 4.9233 5.0395 6.2811 6.0668 4.8451 5.5194

log(n

He

/nH )

-2.6540

Chapter A - On the Automatic Analysis of Stellar Sp ectra

-0.4410 -4.8178 -3.3097 2.1208 -2.4338 -2.4573 0.1620 -2.7701 -2.8087 0.6611 -2.2544 -3.2872 -2.2044 0.4509 0.0589 -4.0618 -2.4102 -3.3172 1.2766 -3.9213

continued on next page

Table A.1: continued

SFIT Results Identifier Teff (K) PG1704+222 PG1705+537 PG1707+657 PG1708+142 PG1708+602 PG1710+490 PG1710+567 PG1715+273 PG1717+423 PG1722+286 PG1724+590 PG1738+505 PG1739+489 PG1743+477 PG2059+013 PG2111+023 PG2120+062 PG2148+095 PG2151+100 PG2158+082 PG2159+051 15926 15025 36119 18154 42477 29497 33908 29074 23293 33802 28706 24113 22474 26873 33086 14681 32008 30001 35789 49999 13496 334 341 114 459 775 546 486 227 615 292 605 746 539 1206 233 254 512 806 66 743 252 Teff log g (cgs) 2.561 3.437 5.993 3.480 5.256 5.102 5.110 3.797 4.654 5.795 5.000 4.922 4.569 4.904 5.583 4.000 4.133 4.555 5.941 5.500 3.244 0.053 0.062 0.033 0.080 0.074 0.069 0.033 0.016 0.074 0.058 0.094 0.081 0.066 0.123 0.046 0.055 0.052 0.082 0.034 0.119 0.050 -1.131 -1.356 -1.902 -1.245 -0.945 -2.705 -1.538 1.070 -2.991 -1.757 -2.420 -1.829 -2.744 -2.214 -1.607 -1.274 -0.950 -3.000 -2.677 0.364 -1.441 0.053 0.049 0.104 0.068 0.060 0.220 0.045 0.010 0.425 0.050 0.114 0.059 0.241 0.071 0.053 0.073 0.046 0.434 0.206 0.029 0.084 5.49E-01 1.23E+00 6.97E-01 2.06E+00 8.49E-01 1.06E+00 9.87E-01 4.32E+00 1.12E+00 1.33E+00 9.74E-01 9.28E-01 7.71E-01 1.82E+00 2.64E+00 3.69E+00 7.43E-01 5.46E-01 7.84E-01 1.28E+00 3.91E-01 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 19339.7888 19121.6772 32424.6910 20510.9488 52356.2200 30354.3920 30094.2953 35385.5095 26991.4606 31508.9630 27423.4056 28143.7622 28940.6842 26957.5253 31942.9438 18712.6960 32664.6531 25977.6010 41296.8002 53960.7462 13921.4240

log g (cgs) 2.9039 3.7978 5.4695 3.6958 5.9487 5.2502 4.8112 4.5349 4.9110 5.7798 4.9714 5.3359 5.0868 4.8522 5.5146 4.0693 4.1185 3.9359 5.7262 5.8944 3.3088

log(n

He

/nH )

-1.5097 -1.7138 -1.9184 -1.5666 -3.2523 -3.0947 -2.7425 3.0662 -3.2374 -1.6469 -3.2446 -1.8641 -3.4167 -3.3545 -1.9470 -2.2958 -1.1104 -3.3697 -3.0819 -1.2564 -1.7585

On the Automatic Analysis of Stellar Sp ectra

171

continued on next page

172

Table A.1: continued

SFIT Results Identifier Teff (K) PG2204+035 PG2205+023 PG2215+151 PG2218+020 PG2219+094 PG2229+099 PG2258+155 PG2259+134 PG2301+259 PG2314+076 PG2317+046 PG2318+239 PG2321+214 PG2331+038 PG2337+070 PG2339+199 PG2345+241 PG2349+002 PG2351+198 PG2352+181 PG2358+107 31535 27156 45566 21062 19206 16940 34000 31323 18959 30140 33177 16940 38502 29017 29563 30763 17743 28383 14539 47309 23768 302 646 318 711 754 236 381 396 395 305 797 296 268 428 670 396 288 334 269 246 966 Teff log g (cgs) 5.917 5.622 5.841 4.768 3.546 3.755 4.481 5.772 4.217 5.640 4.504 3.778 4.977 5.642 5.735 4.189 3.699 5.600 3.768 5.873 4.978 0.059 0.069 0.068 0.070 0.068 0.053 0.046 0.057 0.061 0.050 0.104 0.036 0.063 0.054 0.060 0.043 0.044 0.053 0.046 0.089 0.105 -2.506 -3.000 1.951 -2.707 -1.457 -0.958 0.945 -1.975 -1.904 -3.000 -3.000 -1.392 2.314 -2.401 -1.997 1.042 -0.954 -3.000 -1.380 0.264 -2.685 0.139 0.434 0.620 0.221 0.112 0.031 0.008 0.082 0.070 0.434 0.434 0.075 0.179 0.109 0.086 0.009 0.046 0.434 0.094 0.021 0.210 3.75E+00 1.06E+00 4.56E+00 1.34E+00 1.39E+00 1.41E+00 3.68E+00 1.35E+00 6.67E-01 3.38E+00 1.54E+00 1.47E+00 1.52E+00 3.59E+00 7.85E-01 4.17E+00 6.76E-01 2.00E+00 1.64E+00 2.81E+00 3.04E+00 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 23039.5221 24413.7057 41433.7407 30254.5605 24813.0492 18337.4404 37528.2798 31934.3911 16716.7103 31564.2513 40658.1786 20890.7160 39171.8164 32370.5612 29133.8280 33495.0283 19581.9082 26081.2311 13733.0764 44348.4299 24626.4068

log g (cgs) 5.2470 5.4836 5.4859 5.4740 4.0618 3.8720 5.2630 5.8685 4.1106 5.1605 4.6877 4.2557 5.0203 5.8341 5.8738 4.4077 3.8835 5.2581 4.0071 5.7750 4.9117

log(n

He

/nH )

-2.1168

Chapter A - On the Automatic Analysis of Stellar Sp ectra

-3.0047 -0.1429 -2.9763 -1.7815 -1.3692 2.4797 -2.0224 -1.5974 -3.1646 -3.3510 -1.8445 2.4143 -2.7235 -1.4688 1.0705 -1.0702 -4.0008 -1.6889 -0.4701 -3.6593

continued on next page

Table A.1: continued

SFIT Results Identifier Teff (K) PHL 1079 PHL 4 TON 107 VZ1128 M3 31862 40197 39369 34893 389 685 266 388 Teff log g (cgs) 5.589 5.000 5.602 4.500 0.055 0.093 0.039 0.058 -2.303 -1.254 -0.076 -0.968 0.087 0.062 0.003 0.044 1.32E+00 6.59E-01 4.42E+00 8.72E-01 log g log(n
He

ANN Results log(n
He

/nH )

/nH )

2

Teff (K) 31739.7364 50528.5221 36793.9981 35857.4448

log g (cgs) 5.6372 5.6359 4.8072 4.3289

log(n

He

/nH )

-2.0741 -2.8566 -0.2664 -2.0402

On the Automatic Analysis of Stellar Sp ectra

173

App endix B

Results for 282 SDSS DR3 Hot Subdwarf Candidates
This table lists the classification and parameterisation results for the SDSS hot sub dwarf candidates of Chapter 5. Also listed for each star are its p osition and redshift as obtained by the SDSS. The internal errors of SFIT are given, along with the value of 2 for the b est fit.

175

176

Table B.1: Results for 282 SDSS Hot Subdwarf Candidates
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 26.6456 44.9755 25.5077 34.6605 33.4529 31.5028 32.9952 29.9172 33.771 33.4439 33.6757 28.2265 31.5085 32.428 34.2372 31.8302 22.922 32.684 38.7134 33.6007 23.2616 29.7373

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J000607.88-010320.8 J001651.42-011329.3 J001837.14+152150.0 J001930.36+135530.9 J002323.99-002953.3 J002852.26+135446.5 J004233.43+004717.6 J011506.17+140513.5 J013847.59+141532.1 J015026.10-094226.9 J021617.11-095513.1 J023032.65-081439.5 J031620.13+004222.9 J031854.14+004135.0 J033358.21+002007.5 J073712.28+264224.7 J073856.99+401942.1 J074001.91+240127.4 J074458.10+324259.9 J074534.16+372718.6 J074613.17+333307.6 J074720.59+384910.7

00:06:07.88 00:16:51.42 00:18:37.14 00:19:30.36 00:23:23.99 00:28:52.26 00:42:33.43 01:15:06.17 01:38:47.59 01:50:26.10 02:16:17.11 02:30:32.65 03:16:20.13 03:18:54.14 03:33:58.21 07:37:12.28 07:38:56.99 07:40:01.91 07:44:58.10 07:45:34.16 07:46:13.17 07:47:20.59

-01:03:20.8 -01:13:29.3 +15:21:50.0 +13:55:30.9 -00:29:53.3 +13:54:46.5 +00:47:17.6 +14:05:13.5 +14:15:32.1 -09:42:26.9 -09:55:13.1 -08:14:39.5 +00:42:22.9 +00:41:35.0 +00:20:07.5 +26:42:24.7 +40:19:42.1 +24:01:27.4 +32:42:59.9 +37:27:18.6 +33:33:07.6 +38:49:10.7

-279.634 115.816 -73.1296 -225.249 -5.64728 35.3602 24.2894 -162.802 -188.872 -18.51 -29.2216 -194.8 -23.1856 6.24576 117.167 8.71841 -202.597 -130.711 50.6134 29.853 -13.6868 25.1426

sdO7VII:He39 sdO2VII:He26 sdO9VII:He39 sdB0VI:He12 sdB3VI:He5 sdO9VI:He8 sdB4V:He1 sdB3IV:He7 sdO9VIII:He7 sdB8VI:He10 sdO8VI:He12 sdB6IV:He3 sdB0V:He8 sdB7IV:He2 sdB2VI:He13 sdO9VI:He7 sdO1VII:He34 sdA1IV:He0 sdO9VII:He24 sdB0VII:He4 sdO7VII:He39 sdB7IV:He3

42788 47737 38579 29686 30771 33187 29989 32000 27012 33657 35372 13683 32906 20554 34164 32799 50000 30156 38824 35000 44962 16696

155 3077 94 475 241 387 668 380 612 341 180 176 81 512 234 75 547 269 74 155 100 250

5.280 3.726 4.555 5.382 5.746 4.944 4.987 4.545 4.972 5.303 5.561 3.755 5.692 4.500 5.270 5.788 5.495 5.085 5.996 5.394 6.000 3.821

0.060 0.083 0.026 0.045 0.042 0.062 0.080 0.056 0.072 0.029 0.034 0.037 0.016 0.063 0.036 0.019 0.087 0.042 0.025 0.033 0.031 0.047

2.093 0.364 2.522 -1.120 -1.873 -2.692 -3.000 -2.991 -2.212 -1.402 -1.195 -1.399 -2.227 -2.047 -0.771 -2.278 0.644 -2.991 -0.512 -3.000 1.261 -1.588

0.108 0.044 0.433 0.023 0.032 0.214 0.434 0.425 0.071 0.022 0.027 0.065 0.073 0.097 0.017 0.082 0.047 0.425 0.010 0.434 0.047 0.101

6.17E+00 2.59E+00

Chapter B - On the Automatic Analysis of Stellar Sp ectra

6.46E+00 6.17E+00 6.68E-01 9.48E-01 2.73E+00 7.56E+00 7.17E+00 1.58E+00 9.74E-01 7.83E-01 1.67E+00 1.82E+00 6.59E+00 1.71E+00 2.18E+00 7.72E+00 8.41E+00 2.44E+00 1.60E+00 2.74E+00

continued on next page

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 34.0109 32.7577 34.6248 29.591 22.3976 32.5904 28.5258 32.9205 38.0934 31.9453 30.7086 30.9737 32.2109 29.9143 28.1785 32.1614 31.5714 31.7429 31.7126 31.7933 28.589 26.6296

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J074806.15+342927.7 J074811.34+435239.6 J075236.78+441642.5 J075249.96+305935.2 J080259.80+411438.0 J080628.10+323059.4 J080726.80+303501.8 J081342.92+275034.8 J081540.66+430524.5 J081607.91+480349.7 J082751.06+410925.9 J082802.04+404009.0 J083006.17+475150.4 J083241.96+483445.1 J083456.98+422053.2 J083842.71+053309.5 J083935.91+030840.8 J084122.67+063029.6 J084413.77+023229.3 J084556.16+542357.6 J084727.88+024814.8 J085422.40+013651.0

07:48:06.15 07:48:11.34 07:52:36.78 07:52:49.96 08:02:59.80 08:06:28.10 08:07:26.80 08:13:42.92 08:15:40.66 08:16:07.91 08:27:51.06 08:28:02.04 08:30:06.17 08:32:41.96 08:34:56.98 08:38:42.71 08:39:35.91 08:41:22.67 08:44:13.77 08:45:56.16 08:47:27.88 08:54:22.40

+34:29:27.7 +43:52:39.6 +44:16:42.5 +30:59:35.2 +41:14:38.0 +32:30:59.4 +30:35:01.8 +27:50:34.8 +43:05:24.5 +48:03:49.7 +41:09:25.9 +40:40:09.0 +47:51:50.4 +48:34:45.1 +42:20:53.2 +05:33:09.5 +03:08:40.8 +06:30:29.6 +02:32:29.3 +54:23:57.6 +02:48:14.8 +01:36:51.0

-54.5131 26.4415 -102.636 144.115 -26.9931 -38.8078 -65.0792 -5.38343 131.458 -26.1981 -21.3762 -182.813 -6.3029 26.2838 13.8063 68.4768 21.5494 -11.2503 268.878 6.03341 136.219 -7.05121

sdB0VI:He6 sdB1VI:He8 sdB1V:He9 sdB4V:He34 sdO7VII:He39 sdB2V:He11 sdB5IV:He4 sdB3VI:He2 sdB4V:He32 sdB2VI:He5 sdO5VII:He36 sdO4VII:He11 sdB0VII:He3 sdB4V:He5 sdB6III:He5 sdO9VII:He5 sdB0VI:He10 sdB0VI:He6 sdB6IV:He0 sdB5IV:He11 sdB5IV:He2 sdB0VI:He25

33653 25154 34243 39999 44340 30983 13573 27634 37418 24992 48823 36831 27934 18422 10816 30554 35310 32357 13225 30069 13429 34383

269 617 241 182 95 240 172 681 195 545 522 344 1042 288 93 214 415 79 156 344 182 203

5.478 5.003 5.496 5.576 6.000 5.650 3.697 5.500 5.536 5.359 5.503 5.088 5.330 4.241 3.075 5.502 4.854 5.620 4.036 5.053 3.539 5.287

0.053 0.067 0.019 0.027 0.030 0.035 0.035 0.070 0.033 0.058 0.084 0.050 0.084 0.036 0.028 0.047 0.057 0.015 0.033 0.036 0.035 0.037

-1.697 -1.463 -1.059 1.134 1.261 -1.332 -1.552 -2.294 -0.307 -2.293 0.397 -2.117 -2.892 -1.521 -1.789 -2.281 -2.592 -2.141 -2.156 -1.170 -2.100 -0.784

0.043 0.025 0.020 0.012 0.047 0.019 0.062 0.085 0.008 0.085 0.023 0.114 0.339 0.072 0.187 0.083 0.170 0.060 0.124 0.026 0.164 0.018

1.38E+00 1.29E+00 2.63E+00 9.97E+00 1.68E+00 8.91E-01 8.53E-01 1.28E+00 1.52E+01 1.18E+00 1.30E+01 3.48E+00 7.32E-01 2.29E+00 3.02E+00 9.13E-01 7.74E-01 8.36E-01 3.87E+00 1.02E+01 1.47E+00 1.70E+00

On the Automatic Analysis of Stellar Sp ectra

177

continued on next page

178

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 19.8957 26.8266 33.5645 26.3361 32.7113 29.6024 29.0217 33.0974 26.6692 31.0954 33.0569 28.1481 33.7491 29.8445 34.2498 28.0576 25.7962 34.0528 29.1801 32.8147 27.6608 48.3655

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J085650.28+401730.9 J085727.66+424215.4 J085900.33+023313.1 J090559.15+055442.1 J091225.13+421922.5 J091544.44+511338.8 J092436.41+040135.7 J092520.70+470330.6 J092634.88+473036.0 J092830.55+561811.8 J093059.63+025032.4 J093215.32-002108.5 J093245.91+081618.6 J093322.20+440322.7 J093549.72+544101.0 J094143.53+535833.4 J094346.62+531429.1 J094623.03+040456.1 J094900.45+025702.9 J095101.29+034757.0 J095847.23+602147.4 J100019.99-003413.3

08:56:50.28 08:57:27.66 08:59:00.33 09:05:59.15 09:12:25.13 09:15:44.44 09:24:36.41 09:25:20.70 09:26:34.88 09:28:30.55 09:30:59.63 09:32:15.32 09:32:45.91 09:33:22.20 09:35:49.72 09:41:43.53 09:43:46.62 09:46:23.03 09:49:00.45 09:51:01.29 09:58:47.23 10:00:19.99

+40:17:30.9 +42:42:15.4 +02:33:13.1 +05:54:42.1 +42:19:22.5 +51:13:38.8 +04:01:35.7 +47:03:30.6 +47:30:36.0 +56:18:11.8 +02:50:32.4 -00:21:08.5 +08:16:18.6 +44:03:22.7 +54:41:01.0 +53:58:33.4 +53:14:29.1 +04:04:56.1 +02:57:02.9 +03:47:57.0 +60:21:47.4 -00:34:13.3

-31.3088 192.101 19.9144 316.248 -82.0724 -19.1172 114.477 38.5098 -6.49944 -135.281 1.1058 178.664 83.619 -49.4753 -99.729 -41.6598 -150.267 55.3774 245.283 136.159 -298.171 90.0055

sdO4VI:He11 sdB3VI:He35 sdB0VI:He4 sdA0III:He3 sdB4V:He6 sdB2V:He6 sdB4IV:He4 sdB5VI:He4 sdB3V:He4 sdO6VI:He18 sdB1VI:He7 sdB4V:He5 sdB4VI:He4 sdB4V:He2 sdB0VI:He4 sdB5IV:He2 sdB1V:He26 sdB0VI:He9 sdB8IV:He0 sdB1VI:He3 sdB5IV:He2 sdO3VIII:He15

27006 38783 33042 12411 30073 34140 15224 29376 17256 41016 30308 14602 32561 16544 36231 12276 35458 37074 14000 30001 12650 45730

1513 151 250 140 349 265 225 480 215 436 171 166 239 229 258 163 147 124 212 560 125 659

3.878 5.406 5.451 3.377 5.150 5.386 3.614 5.196 4.128 5.000 5.537 3.723 5.360 4.143 5.497 3.500 5.043 6.000 3.251 5.417 3.631 5.151

0.098 0.025 0.050 0.044 0.044 0.051 0.032 0.054 0.038 0.067 0.037 0.036 0.042 0.026 0.054 0.041 0.030 0.032 0.040 0.056 0.031 0.072

-3.000 0.689 -1.685 -1.647 -2.647 -1.514 -1.715 -2.459 -1.783 -0.737 -2.122 -2.053 -1.662 -1.629 -1.463 -2.066 -0.573 -1.395 -2.259 -2.228 -1.726 -0.716

0.434 0.018 0.042 0.192 0.193 0.028 0.135 0.125 0.053 0.030 0.058 0.147 0.040 0.111 0.025 0.152 0.014 0.022 0.158 0.073 0.115 0.043

1.20E+00 9.33E+00

Chapter B - On the Automatic Analysis of Stellar Sp ectra

9.75E-01 6.02E+00 2.67E+00 1.92E+00 2.78E+00 3.43E+00 8.97E+00 1.50E+00 1.11E+00 2.31E+00 1.54E+00 2.44E+00 3.48E+00 1.12E+00 3.93E+00 1.01E+00 1.88E+01 9.20E-01 9.37E-01 4.93E+00

continued on next page

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 29.2731 33.5207 30.4631 36.4263 36.3286 31.632 31.8275 32.448 22.147 20.1206 28.0526 23.8682 30.5354 29.0912 29.6862 37.5145 27.2368 20.5426 31.4893 33.2725 29.6543 35.1207

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J100317.05+025510.4 J100740.10+454252.5 J101025.64+045357.0 J101213.21+064030.7 J101218.95+004413.4 J101242.22+484937.4 J101640.84-010900.6 J102057.16+013751.4 J102120.45+444636.9 J102320.37+462026.8 J103022.08+020524.3 J103549.68+092551.9 J103854.02+525847.8 J104248.95+033355.4 J105608.43+034821.3 J110053.56+034622.8 J110215.46+024034.2 J110255.98+521858.2 J110256.32+010012.3 J110302.37-010338.7 J110445.01+092530.9 J111438.57-004024.1

10:03:17.05 10:07:40.10 10:10:25.64 10:12:13.21 10:12:18.95 10:12:42.22 10:16:40.84 10:20:57.16 10:21:20.45 10:23:20.37 10:30:22.08 10:35:49.68 10:38:54.02 10:42:48.95 10:56:08.43 11:00:53.56 11:02:15.46 11:02:55.98 11:02:56.32 11:03:02.37 11:04:45.01 11:14:38.57

+02:55:10.4 +45:42:52.5 +04:53:57.0 +06:40:30.7 +00:44:13.4 +48:49:37.4 -01:09:00.6 +01:37:51.4 +44:46:36.9 +46:20:26.8 +02:05:24.3 +09:25:51.9 +52:58:47.8 +03:33:55.4 +03:48:21.3 +03:46:22.8 +02:40:34.2 +52:18:58.2 +01:00:12.3 -01:03:38.7 +09:25:30.9 -00:40:24.1

203.522 2.52572 94.7419 96.1234 -16.2037 30.3123 7.0939 238.157 -95.3694 38.9026 62.5766 193.763 49.5557 370.055 6.83794 278.131 25.8182 -179.727 301.18 139.329 182.534 110.919

sdB3VI:He34 sdF0IV:He3 sdB7II:He5 sdO4VII:He26 sdB0VII:He9 sdB2VI:He4 sdO9VI:He5 sdB1VI:He3 sdO8VII:He40 sdO6VI:He35 sdB7IV:He3 sdO5VII:He39 sdB2VI:He2 sdB0VII:He9 sdA0III:He-0 sdO0VIII:He18 sdO3VII:He36 sdO6VII:He38 sdB4V:He9 sdO9VII:He4 sdO5VI:He8 sdO1VII:He21

38522 30057 15489 42824 35915 28000 30235 28380 45002 45164 13334 45000 27999 34770 14204 46166 50000 45355 15645 29649 30800 42076

164 388 177 249 227 423 296 289 100 147 236 147 411 212 151 513 208 380 228 500 333 569

5.908 5.239 4.171 5.928 5.373 5.109 5.000 5.066 5.828 5.814 3.487 5.745 5.289 5.105 3.805 5.616 5.804 5.549 3.701 5.096 4.500 5.272

0.032 0.051 0.034 0.056 0.042 0.042 0.054 0.045 0.030 0.034 0.045 0.048 0.045 0.041 0.037 0.056 0.066 0.045 0.037 0.052 0.059 0.055

0.045 -2.987 -2.103 -0.568 -1.194 -2.195 -2.659 -3.000 1.261 1.272 -2.140 1.473 -2.582 -2.269 -1.997 -1.181 0.364 0.588 -1.483 -2.579 -2.692 -0.866

0.001 0.421 0.110 0.010 0.027 0.068 0.198 0.434 0.047 0.057 0.120 0.232 0.166 0.081 0.129 0.039 0.020 0.030 0.053 0.165 0.214 0.038

8.16E+00 4.46E+00 8.80E+00 2.14E+00 3.83E+00 1.27E+00 2.13E+00 1.43E+00 4.02E+00 5.39E+00 1.92E+00 1.78E+00 9.41E-01 3.92E+00 4.82E+00 3.28E+00 3.51E+00 3.77E+00 1.35E+01 1.46E+00 1.38E+00 5.76E+00

On the Automatic Analysis of Stellar Sp ectra

179

continued on next page

180

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 27.5999 30.1321 33.3237 25.0661 34.4941 27.4911 32.8276 33.6331 35.8462 30.0023 35.4397 35.0436 28.6527 21.8454 26.4144 34.0894 31.3655 22.4985 39.0234 35.2988 31.4899 30.4367

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J111633.30+052507.9 J112056.23+093641.8 J112242.70+613758.5 J112504.73+671658.3 J112719.00+660538.7 J113312.13+010824.9 J113840.69-003531.8 J113935.45+614954.0 J114352.74+660723.4 J114417.53-012914.1 J114821.30+033625.8 J115009.49+061042.1 J115101.04+541003.5 J115115.19-015255.2 J115654.09-032510.2 J115716.38+612410.8 J120311.26+045419.6 J120626.55+663352.5 J121123.37+611203.9 J121424.81+550226.3 J121625.83-014804.6 J121643.73+020835.9

11:16:33.30 11:20:56.23 11:22:42.70 11:25:04.73 11:27:19.00 11:33:12.13 11:38:40.69 11:39:35.45 11:43:52.74 11:44:17.53 11:48:21.30 11:50:09.49 11:51:01.04 11:51:15.19 11:56:54.09 11:57:16.38 12:03:11.26 12:06:26.55 12:11:23.37 12:14:24.81 12:16:25.83 12:16:43.73

+05:25:07.9 +09:36:41.8 +61:37:58.5 +67:16:58.3 +66:05:38.7 +01:08:24.9 -00:35:31.8 +61:49:54.0 +66:07:23.4 -01:29:14.1 +03:36:25.8 +06:10:42.1 +54:10:03.5 -01:52:55.2 -03:25:10.2 +61:24:10.8 +04:54:19.6 +66:33:52.5 +61:12:03.9 +55:02:26.3 -01:48:04.6 +02:08:35.9

-72.9026 143.332 -64.5492 -88.07 -8.00635 535.441 -81.2527 -3.84451 -131.878 286.516 -4.57768 44.1678 -210.473 108.73 48.0882 -160.892 317.09 -83.8421 -147.955 -69.0878 67.9746 65.6063

sdO9VI:He29 sdO5VII:He11 sdB3VI:He6 sdO9VI:He12 sdB5V:He6 sdB5IV:He4 sdF5VI:He3 sdB0VI:He3 sdB3VII:He2 sdB4V:He2 sdB5V:He5 sdO3V:He19 sdO6VI:He8 sdB6V:He12 sdO9V:He14 sdB3VII:He1 sdB8IV:He0 sdO7VII:He39 sdB7V:He9 sdO5VII:He12 sdB1V:He5 sdO8VI:He38

34332 35000 30001 34249 30552 11978 30867 29767 30474 13292 33168 40640 32000 16099 32698 29999 12883 43748 32971 41632 12307 40933

221 357 495 130 197 183 228 646 218 160 178 288 602 329 279 521 188 87 192 331 116 127

4.894 4.714 5.622 4.948 5.303 3.412 5.424 5.000 5.389 3.583 5.686 5.147 4.484 2.774 4.499 5.224 3.596 5.875 5.833 5.253 3.392 5.595

0.037 0.055 0.043 0.033 0.042 0.047 0.048 0.079 0.046 0.032 0.036 0.057 0.077 0.052 0.043 0.057 0.033 0.032 0.041 0.046 0.041 0.021

-0.496 -2.584 -2.053 -1.800 -2.661 -2.020 -2.209 -2.995 -2.096 -2.300 -1.602 -0.844 -2.241 -2.995 -1.126 -3.000 -2.212 2.007 -1.924 -1.631 -1.159 0.065

0.010 0.167 0.049 0.082 0.199 0.319 0.070 0.430 0.054 0.173 0.035 0.030 0.076 0.430 0.029 0.434 0.142 0.132 0.073 0.056 0.087 0.001

6.62E+00 1.68E+00

Chapter B - On the Automatic Analysis of Stellar Sp ectra

8.95E-01 1.91E+00 7.43E+00 6.60E+00 7.48E-01 2.80E+00 6.31E+00 3.84E+00 6.88E+00 8.63E+00 2.78E+00 1.35E+01 2.16E+00 2.15E+00 1.39E+01 2.07E+00 7.23E+00 2.43E+00 1.45E+01 1.07E+01

continued on next page

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 32.305 29.6831 31.7309 24.1427 33.3849 36.143 33.0009 26.8671 33.4044 29.8305 29.2904 23.7974 33.4203 31.1712 33.3195 28.2369 28.1469 30.4253 27.1188 24.6498 30.6874 30.5863

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J122057.48-012642.4 J122444.98+583313.9 J122637.12+575927.6 J123808.66+053318.2 J123821.49-021211.5 J124706.79-003925.9 J124728.16+562958.3 J124819.08+035003.3 J125229.60-030129.6 J125248.84+521604.1 J125328.45+042044.0 J125408.32+014324.1 J125410.86-010408.4 J125941.88-003928.8 J130025.53+004530.2 J130059.21+005711.8 J131425.39+011153.4 J131452.97+023740.3 J131638.48+034818.5 J131658.35+641522.5 J131745.80+010450.4 J131916.15-011405.0

12:20:57.48 12:24:44.98 12:26:37.12 12:38:08.66 12:38:21.49 12:47:06.79 12:47:28.16 12:48:19.08 12:52:29.60 12:52:48.84 12:53:28.45 12:54:08.32 12:54:10.86 12:59:41.88 13:00:25.53 13:00:59.21 13:14:25.39 13:14:52.97 13:16:38.48 13:16:58.35 13:17:45.80 13:19:16.15

-01:26:42.4 +58:33:13.9 +57:59:27.6 +05:33:18.2 -02:12:11.5 -00:39:25.9 +56:29:58.3 +03:50:03.3 -03:01:29.6 +52:16:04.1 +04:20:44.0 +01:43:24.1 -01:04:08.4 -00:39:28.8 +00:45:30.2 +00:57:11.8 +01:11:53.4 +02:37:40.3 +03:48:18.5 +64:15:22.5 +01:04:50.4 -01:14:05.0

-4.416 -85.1992 -194.41 25.9705 83.2689 7.95478 -159.902 -56.2564 47.7554 -128.9 -1.7345 2.43989 88.5476 45.2168 69.5908 -30.0593 -116.094 -67.101 6.70663 -178.639 60.094 227.376

sdB9III:He1 sdB2IV:He-0 sdB0VII:He3 sdO4VII:He38 sdO1VIII:He29 sdO9VII:He4 sdO9VII:He1 sdO6VIII:He16 sdB0V:He6 sdB7III:He4 sdB4IV:He3 sdO6VII:He40 sdB3IV:He6 sdO5VIII:He15 sdO9VII:He16 sdB0VII:He25 sdB5IV:He2 sdB9IV:He2 sdB1V:He9 sdB1VI:He14 sdB1VI:He3 sdB4VI:He3

18548 11106 30382 44999 48017 36859 24936 49129 30736 13184 12250 45000 20858 34375 38249 37650 13063 17290 32528 34296 26694 16894

245 135 187 127 516 318 896 1057 221 155 168 100 320 213 102 234 134 316 59 190 850 205

4.613 3.124 5.350 5.352 5.158 5.551 4.876 5.500 5.454 3.605 3.500 5.999 4.632 5.149 5.756 5.212 3.663 4.000 5.250 5.544 4.929 4.132

0.038 0.037 0.042 0.041 0.081 0.042 0.085 0.069 0.047 0.032 0.043 0.031 0.039 0.025 0.025 0.034 0.032 0.047 0.031 0.038 0.089 0.035

-1.862 -2.291 -2.718 2.125 0.210 -3.000 -2.823 -1.226 -2.345 -1.715 -2.231 1.311 -1.689 -1.570 -0.845 -0.443 -1.696 -1.858 -1.376 -1.427 -2.159 -1.737

0.063 0.170 0.227 0.289 0.008 0.434 0.289 0.051 0.096 0.090 0.148 0.053 0.042 0.048 0.021 0.010 0.086 0.063 0.021 0.023 0.063 0.047

1.10E+01 1.74E+01 2.14E+00 3.96E+00 7.27E+00 9.50E-01 3.60E+00 2.62E+00 1.46E+00 8.25E+00 2.41E+00 2.30E+00 5.36E+00 2.75E+00 1.90E+00 2.90E+00 6.81E-01 5.24E+00 2.06E+00 2.10E+00 1.48E+00 1.39E+00

On the Automatic Analysis of Stellar Sp ectra

181

continued on next page

182

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 30.226 34.6956 32.8285 33.9539 27.9956 31.3781 30.4751 30.0884 37.7028 33.4071 31.2636 31.8161 27.8615 32.612 33.4862 32.0421 31.5385 28.4411 33.9581 30.1228 31.5639 33.792

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J132503.17+043239.4 J132556.94-032329.6 J132619.95+035754.4 J133200.96+673325.8 J133449.26+041014.9 J133546.10+555429.8 J133757.40-005647.2 J134344.11+465825.3 J134545.24-000641.6 J134600.55+052034.3 J134948.30-024639.3 J135140.69+023429.2 J135707.35+010454.4 J135746.59+530758.7 J140118.74-012024.8 J140252.20+465918.5 J140545.26+014419.1 J140715.42+033147.6 J140839.10+653124.4 J141812.51-024427.0 J142226.93-023100.5 J142339.81+014947.3

13:25:03.17 13:25:56.94 13:26:19.95 13:32:00.96 13:34:49.26 13:35:46.10 13:37:57.40 13:43:44.11 13:45:45.24 13:46:00.55 13:49:48.30 13:51:40.69 13:57:07.35 13:57:46.59 14:01:18.74 14:02:52.20 14:05:45.26 14:07:15.42 14:08:39.10 14:18:12.51 14:22:26.93 14:23:39.81

+04:32:39.4 -03:23:29.6 +03:57:54.4 +67:33:25.8 +04:10:14.9 +55:54:29.8 -00:56:47.2 +46:58:25.3 -00:06:41.6 +05:20:34.3 -02:46:39.3 +02:34:29.2 +01:04:54.4 +53:07:58.7 -01:20:24.8 +46:59:18.5 +01:44:19.1 +03:31:47.6 +65:31:24.4 -02:44:27.0 -02:31:00.5 +01:49:47.3

-179.976 66.5581 -82.3389 -101.751 83.6733 -21.676 -159.448 3.03651 -3.70993 -39.3819 -149.397 20.1812 -164.908 -88.3881 -138.416 -178.233 -66.9086 31.4863 -170.614 -183.21 129.172 27.9633

sdB9V:He3 sdB5VI:He10 sdO9VI:He13 sdO9VII:He7 sdB0VII:He29 sdB4V:He3 sdB6III:He6 sdO5VIII:He10 sdO8VII:He28 sdB4V:He0 sdB4IV:He3 sdO9VI:He4 sdO7VIII:He29 sdB2VI:He2 sdO6VII:He7 sdB5VI:He6 sdO9VI:He6 sdO8VI:He35 sdB0VI:He4 sdO1VIII:He32 sdO6V:He2 sdB4V:He6

12173 41733 34266 34458 34610 16963 12195 29224 35903 30255 11972 33545 31999 29997 30768 29111 28704 49941 30105 50000 14553 30705

89 589 169 233 210 259 126 223 207 205 150 283 274 447 263 465 421 546 320 209 198 324

3.128 5.500 5.312 5.474 4.778 4.022 3.450 4.133 5.268 5.216 3.635 5.190 3.749 5.104 5.000 5.131 5.389 5.500 5.122 5.951 3.828 5.509

0.023 0.064 0.011 0.048 0.033 0.039 0.037 0.047 0.035 0.041 0.044 0.048 0.049 0.044 0.053 0.052 0.051 0.087 0.045 0.067 0.040 0.049

-0.750 -1.997 -1.206 -1.532 -0.136 -1.767 -2.019 -3.000 -0.269 -3.000 -2.284 -3.000 0.597 -2.552 -2.479 -2.315 -1.645 0.572 -3.000 0.364 -1.470 -1.753

0.047 0.129 0.028 0.030 0.004 0.051 0.181 0.434 0.006 0.434 0.167 0.434 0.027 0.155 0.131 0.090 0.038 0.039 0.434 0.020 0.090 0.049

1.16E+01 5.42E+00

Chapter B - On the Automatic Analysis of Stellar Sp ectra

1.41E+00 2.48E+00 5.32E+00 9.94E+00 5.28E+00 7.49E+00 3.79E+00 1.85E+00 7.21E+00 1.74E+00 4.46E+00 3.56E+00 2.19E+00 5.23E+00 1.07E+00 6.76E+00 2.97E+00 2.07E+00 1.70E+01 2.50E+00

continued on next page

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 27.5647 33.0024 32.3239 31.4605 21.9973 39.9306 30.2275 30.1996 36.2062 30.9137 29.5335 35.8834 25.936 33.0281 34.5196 31.7249 32.273 32.8387 24.9202 30.5689 34.0157 22.8201

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J142416.88-014335.0 J142459.58+031943.4 J142551.30-013317.3 J142956.63+563144.0 J143006.23+510314.1 J143153.06-002824.3 J143917.64+010250.8 J143917.64+010251.1 J144024.72+022118.7 J144141.37+450651.4 J144301.69+514410.3 J144346.62+491733.7 J144514.93+000249.0 J144709.20+511639.8 J144737.76+020942.6 J145049.50+624940.9 J145426.67+472004.4 J145606.42+500155.3 J145657.73+495310.8 J145748.84+561323.5 J150829.03+494051.0 J151030.69-014345.9

14:24:16.88 14:24:59.58 14:25:51.30 14:29:56.63 14:30:06.23 14:31:53.06 14:39:17.64 14:39:17.64 14:40:24.72 14:41:41.37 14:43:01.69 14:43:46.62 14:45:14.93 14:47:09.20 14:47:37.76 14:50:49.50 14:54:26.67 14:56:06.42 14:56:57.73 14:57:48.84 15:08:29.03 15:10:30.69

-01:43:35.0 +03:19:43.4 -01:33:17.3 +56:31:44.0 +51:03:14.1 -00:28:24.3 +01:02:50.8 +01:02:51.1 +02:21:18.7 +45:06:51.4 +51:44:10.3 +49:17:33.7 +00:02:49.0 +51:16:39.8 +02:09:42.6 +62:49:40.9 +47:20:04.4 +50:01:55.3 +49:53:10.8 +56:13:23.5 +49:40:51.0 -01:43:45.9

91.6777 37.3083 -37.9939 -91.5749 -112.744 -31.9216 28.1612 24.1876 -35.8459 -154.864 -171.018 -80.1405 0.302733 -148.603 35.4762 -159.082 21.6808 -69.5387 -39.8157 -202.486 -136.073 -152.78

sdB3V:He3 sdO8VII:He6 sdO2VII:He13 sdB1VI:He3 sdO4VII:He37 sdO6VIII:He7 sdB1VI:He6 sdB5V:He3 sdO6VI:He11 sdB2VI:He6 sdB8IV:He4 sdO8VIII:He12 sdB1VII:He12 sdO7VI:He34 sdO5VII:He11 sdB0VI:He2 sdB2VI:He5 sdB5V:He8 sdB1VI:He9 sdB3VI:He2 sdB0VII:He1 sdO7VII:He39

14999 34325 41093 28557 45000 35581 19939 19464 34898 19982 19181 35572 34682 45000 32773 28051 29437 30764 33478 12512 27313 44820

189 274 307 299 99 232 368 338 91 484 308 224 78 100 127 771 522 300 223 109 913 94

3.634 5.321 5.375 5.068 5.767 5.499 4.134 4.116 5.332 4.500 4.358 5.421 5.019 5.908 5.563 4.739 5.478 5.343 5.348 3.669 5.012 5.982

0.034 0.038 0.046 0.046 0.030 0.043 0.043 0.041 0.021 0.059 0.050 0.048 0.033 0.031 0.016 0.071 0.056 0.044 0.046 0.029 0.087 0.033

-2.168 -1.414 -1.716 -2.598 1.261 -0.977 -1.676 -1.709 -1.067 -1.690 -1.592 -1.450 -1.399 1.260 -2.085 -3.000 -2.180 -1.598 -1.547 -2.057 -2.359 1.866

0.128 0.023 0.045 0.172 0.047 0.020 0.041 0.044 0.020 0.043 0.034 0.024 0.033 0.047 0.053 0.434 0.066 0.034 0.031 0.148 0.099 0.191

9.53E-01 1.74E+00 1.54E+00 1.03E+00 2.27E+00 5.72E+00 1.34E+00 2.06E+00 3.55E+00 2.50E+00 1.22E+00 1.91E+00 3.37E+00 1.74E+01 7.14E+00 6.07E+00 9.68E-01 1.78E+00 1.62E+00 8.49E+00 4.82E+00 2.06E+00

On the Automatic Analysis of Stellar Sp ectra

183

continued on next page

184

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 33.3369 33.1678 36.6202 35.0598 24.7826 32.9988 32.279 26.0169 29.707 34.981 31.4734 28.5694 21.3778 30.2446 31.2027 33.0638 26.1501 24.0808 32.9385 30.7923 35.4187 29.7408

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J151042.06+040955.5 J151105.38+515956.4 J151231.29+005317.7 J151306.72+011439.1 J151415.66-012925.2 J151617.94+412948.4 J151743.47+514445.4 J151808.48+041043.7 J151847.69+551154.2 J152332.82+353237.0 J152607.88+001640.8 J152833.16+440009.7 J153056.33+024222.6 J153204.36+324152.7 J153217.24+454621.0 J153411.10+543345.3 J153508.52+032456.3 J154043.10+435950.1 J154338.69+001202.1 J154531.02+563944.7 J154809.97-004931.4 J154830.67+003656.7

15:10:42.06 15:11:05.38 15:12:31.29 15:13:06.72 15:14:15.66 15:16:17.94 15:17:43.47 15:18:08.48 15:18:47.69 15:23:32.82 15:26:07.88 15:28:33.16 15:30:56.33 15:32:04.36 15:32:17.24 15:34:11.10 15:35:08.52 15:40:43.10 15:43:38.69 15:45:31.02 15:48:09.97 15:48:30.67

+04:09:55.5 +51:59:56.4 +00:53:17.7 +01:14:39.1 -01:29:25.2 +41:29:48.4 +51:44:45.4 +04:10:43.7 +55:11:54.2 +35:32:37.0 +00:16:40.8 +44:00:09.7 +02:42:22.6 +32:41:52.7 +45:46:21.0 +54:33:45.3 +03:24:56.3 +43:59:50.1 +00:12:02.1 +56:39:44.7 -00:49:31.4 +00:36:56.7

-63.2658 -151.756 -47.1969 -93.1191 -121.987 -181.258 -79.9052 -38.9928 -34.4602 11.3484 87.1892 45.6215 -69.1411 -152.653 -56.1277 -89.259 11.5925 -70.6263 -29.8052 -104.244 -28.7556 -126.171

sdO9VI:He10 sdB3VI:He1 sdB1VI:He29 sdB3IV:He2 sdO5VII:He39 sdB3VI:He2 sdO8VII:He10 sdO9VI:He14 sdB5V:He3 sdB3VI:He24 sdO5VI:He34 sdB7III:He2 sdO6VII:He39 sdB0VI:He38 sdO8VII:He9 sdB0VI:He3 sdB4II:He10 sdO7VII:He38 sdB2VI:He2 sdB3VI:He5 sdO9VI:He4 sdB4IV:He4

34792 18406 36493 26002 45000 28962 34518 35439 17317 36703 43975 12000 44761 39980 33385 31907 13657 44843 29654 26509 32645 11429

231 281 164 436 164 432 236 221 318 193 320 184 109 117 349 321 282 123 499 389 255 135

5.512 4.286 5.723 5.037 5.687 5.175 5.299 5.426 4.000 5.500 5.050 3.413 6.000 5.783 5.063 5.106 3.012 5.904 5.247 4.933 5.499 3.277

0.046 0.047 0.031 0.045 0.044 0.052 0.046 0.048 0.047 0.035 0.042 0.049 0.029 0.018 0.024 0.048 0.047 0.033 0.057 0.053 0.049 0.035

-1.360 -1.793 -0.486 -2.301 0.560 -2.447 -1.494 -1.549 -1.605 -0.685 -0.072 -2.108 1.163 -0.194 -1.818 -2.394 -1.400 1.218 -2.589 -2.039 -2.978 -0.911

0.020 0.081 0.010 0.087 0.029 0.122 0.027 0.031 0.035 0.016 0.004 0.223 0.019 0.004 0.057 0.108 0.065 0.050 0.169 0.047 0.413 0.052

1.64E+00 1.09E+01

Chapter B - On the Automatic Analysis of Stellar Sp ectra

7.25E+00 3.36E+00 2.33E+00 3.69E+00 2.61E+00 1.38E+00 2.56E+00 2.50E+00 2.08E+00 2.66E+00 1.62E+00 1.69E+01 2.48E+00 1.78E+00 1.69E+01 2.40E+00 2.86E+00 1.55E+00 1.42E+00 9.04E-01

continued on next page

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 34.8931 29.5376 35.5392 34.2912 33.3498 32.9946 33.64 28.2373 36.9722 35.4912 32.3725 34.4132 27.2163 34.2576 21.99 27.457 29.4658 28.006 34.9432 34.1242 34.7076 26.3892

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J155628.35+011335.0 J155642.95+501537.5 J160241.13-001207.1 J160759.27+383746.4 J160810.18+425845.1 J161328.22+004703.2 J161418.97+261628.8 J161627.11-002933.0 J161631.29-003853.3 J162250.09+002631.9 J162256.66+473051.1 J162310.50+425831.2 J162359.61+375435.3 J162535.78+362039.3 J162616.71+380710.5 J162628.92+370448.6 J162711.81-000950.9 J163148.85+372617.2 J163306.58+003216.3 J163446.48-005345.6 J163509.13+000235.0 J163702.79-011351.7

15:56:28.35 15:56:42.95 16:02:41.13 16:07:59.27 16:08:10.18 16:13:28.22 16:14:18.97 16:16:27.11 16:16:31.29 16:22:50.09 16:22:56.66 16:23:10.50 16:23:59.61 16:25:35.78 16:26:16.71 16:26:28.92 16:27:11.81 16:31:48.85 16:33:06.58 16:34:46.48 16:35:09.13 16:37:02.79

+01:13:35.0 +50:15:37.5 -00:12:07.1 +38:37:46.4 +42:58:45.1 +00:47:03.2 +26:16:28.8 -00:29:33.0 -00:38:53.3 +00:26:31.9 +47:30:51.1 +42:58:31.2 +37:54:35.3 +36:20:39.3 +38:07:10.5 +37:04:48.6 -00:09:50.9 +37:26:17.2 +00:32:16.3 -00:53:45.6 +00:02:35.0 -01:13:51.7

67.4947 -173.848 -69.1264 -49.3069 -163.815 -149.526 106.961 -0.0543509 52.866 -15.4032 -68.6 -8.93843 -273.35 -218.657 -45.5658 -51.0648 34.3853 -92.9749 -61.1765 52.9502 41.7581 -20.6207

sdB3V:He2 sdO2VII:He33 sdO7VII:He2 sdO9VII:He14 sdB6V:He0 sdB7IV:He5 sdB5V:He5 sdO7VI:He39 sdB1VII:He6 sdB2VI:He5 sdB2VI:He5 sdB6IV:He0 sdB5V:He3 sdB5V:He3 sdO6VII:He40 sdB6IV:He2 sdB3VII:He34 sdB5III:He2 sdB2VI:He1 sdB4V:He1 sdB6VI:He1 sdO7VI:He39

30988 50000 31639 34081 31066 21059 28481 45000 33369 30731 29637 35771 12662 30223 44792 12281 38699 14660 31471 29977 28046 45001

255 208 283 201 258 530 296 128 346 222 557 59 107 254 98 171 179 186 274 519 733 99

5.425 5.852 5.111 5.542 5.316 4.379 5.192 5.573 5.202 5.410 5.500 5.768 3.617 5.481 6.000 3.500 5.927 3.539 5.390 5.293 5.097 5.664

0.049 0.066 0.046 0.035 0.047 0.062 0.048 0.046 0.027 0.047 0.059 0.025 0.032 0.048 0.031 0.043 0.027 0.035 0.050 0.060 0.059 0.030

-3.000 0.364 -2.624 -0.941 -3.000 -2.274 -2.471 2.083 -1.568 -2.238 -1.764 -2.504 -1.457 -2.371 1.556 -2.359 0.121 -1.962 -3.000 -3.000 -3.000 1.261

0.434 0.020 0.183 0.022 0.434 0.082 0.128 0.210 0.032 0.075 0.050 0.139 0.124 0.102 0.094 0.198 0.003 0.119 0.434 0.434 0.434 0.047

8.72E-01 1.69E+00 4.62E+00 1.90E+00 5.77E+00 5.34E+00 4.44E+00 6.30E+00 2.41E+00 1.79E+00 7.12E-01 6.83E+00 1.16E+00 8.39E+00 1.87E+00 1.25E+00 8.53E+00 2.44E+00 1.45E+00 2.42E+00 3.41E+00 2.51E+00

On the Automatic Analysis of Stellar Sp ectra

185

continued on next page

186

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 29.5044 31.9708 29.4706 26.0607 41.0764 33.4568 30.8894 33.4823 35.8627 31.1266 29.3108 31.3034 35.2667 28.7513 30.1816 28.8328 32.767 34.0747 33.6796 35.8141 33.3324 28.7311

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J163800.17+010259.7 J163815.97-001919.2 J163913.62+384957.1 J163936.03+343230.6 J164042.91+311734.6 J164122.33+334452.1 J164204.38+440303.3 J164326.04+330113.2 J164419.45+452326.8 J164444.94+312345.4 J165022.05+312749.7 J165404.27+303701.8 J165422.26+631534.3 J165424.30+303941.3 J165841.83+413115.6 J170045.67+604308.5 J170356.68+341505.0 J170714.27+654025.6 J171424.17+614711.0 J171629.93+575121.2 J171722.10+580558.9 J171813.87+595355.2

16:38:00.17 16:38:15.97 16:39:13.62 16:39:36.03 16:40:42.91 16:41:22.33 16:42:04.38 16:43:26.04 16:44:19.45 16:44:44.94 16:50:22.05 16:54:04.27 16:54:22.26 16:54:24.30 16:58:41.83 17:00:45.67 17:03:56.68 17:07:14.27 17:14:24.17 17:16:29.93 17:17:22.10 17:18:13.87

+01:02:59.7 -00:19:19.2 +38:49:57.1 +34:32:30.6 +31:17:34.6 +33:44:52.1 +44:03:03.3 +33:01:13.2 +45:23:26.8 +31:23:45.4 +31:27:49.7 +30:37:01.8 +63:15:34.3 +30:39:41.3 +41:31:15.6 +60:43:08.5 +34:15:05.0 +65:40:25.6 +61:47:11.0 +57:51:21.2 +58:05:58.9 +59:53:55.2

-108.217 -136.585 -66.4781 -228.974 -369.398 -49.8183 -336.658 -66.2194 -356.849 -64.8034 8.82859 134.707 -20.424 -249.1 -40.2591 -270.047 -83.8331 -92.8553 -12.9255 -312.791 -110.822 -109.876

sdB6V:He3 sdB4V:He2 sdB3V:He2 sdB2IV:He15 sdB1VII:He33 sdO9VI:He7 sdO9VII:He5 sdB2VI:He3 sdB1VI:He5 sdO8VII:He7 sdB2VI:He26 sdB1VI:He6 sdB2VI:He5 sdB7IV:He0 sdB2VI:He8 sdO4VII:He36 sdB3VI:He3 sdB4VI:He7 sdB0VII:He2 sdO5VIII:He21 sdO9VI:He9 sdB5IV:He2

17182 20734 16016 26684 30989 29235 30160 29898 32287 32067 35685 27306 34568 13840 32230 48271 28252 34656 32113 34186 34248 13053

323 493 225 424 270 394 380 554 245 275 174 638 196 313 294 357 453 250 394 229 187 128

4.000 4.617 3.671 4.531 3.876 5.530 4.958 5.500 5.499 5.483 5.204 5.502 5.672 3.500 5.038 5.869 5.000 5.350 4.999 5.465 5.153 3.685

0.047 0.045 0.037 0.046 0.033 0.043 0.058 0.059 0.047 0.051 0.032 0.067 0.037 0.055 0.037 0.071 0.069 0.049 0.065 0.038 0.021 0.032

-2.192 -2.172 -1.574 -1.121 0.945 -2.085 -2.545 -2.254 -2.080 -3.000 0.118 -2.177 -1.387 -3.000 -1.819 0.731 -2.653 -1.539 -3.000 -0.798 -1.578 -2.096

0.135 0.064 0.049 0.023 0.008 0.053 0.152 0.078 0.052 0.434 0.003 0.065 0.021 0.434 0.057 0.052 0.195 0.030 0.434 0.019 0.049 0.163

1.50E+00 4.45E+00

Chapter B - On the Automatic Analysis of Stellar Sp ectra

3.06E+00 1.44E+00 1.04E+01 1.54E+00 2.84E+00 1.22E+00 2.67E+00 2.03E+00 3.37E+00 1.04E+00 7.54E-01 9.77E+00 1.59E+00 5.12E+00 1.91E+00 1.04E+00 1.81E+00 6.08E+00 1.84E+00 2.32E+00

continued on next page

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 31.6473 29.0229 35.312 24.0531 25.4161 33.1624 28.8934 33.6601 33.4697 28.5065 31.6509 31.6623 32.7679 21.8039 37.4435 26.6768 36.5492 32.9385 20.1741 34.7669 28.2812 26.7784

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J171929.52+273229.3 J171947.87+591604.3 J172037.66+534009.4 J172338.54+601444.1 J203729.93+001954.1 J203826.42+010953.5 J204546.82-054355.7 J204658.84-055100.1 J204726.94-060325.8 J205030.40-061957.9 J210454.89+110645.6 J211045.16+000142.1 J211104.97+091042.9 J211318.37+001738.4 J211338.31-000940.7 J211339.69+100640.4 J211425.02+005517.6 J211651.96-003328.5 J211921.36+005749.8 J213112.24+112936.2 J213718.87+123303.3 J213808.12+105741.8

17:19:29.52 17:19:47.87 17:20:37.66 17:23:38.54 20:37:29.93 20:38:26.42 20:45:46.82 20:46:58.84 20:47:26.94 20:50:30.40 21:04:54.89 21:10:45.16 21:11:04.97 21:13:18.37 21:13:38.31 21:13:39.69 21:14:25.02 21:16:51.96 21:19:21.36 21:31:12.24 21:37:18.87 21:38:08.12

+27:32:29.3 +59:16:04.3 +53:40:09.4 +60:14:44.1 +00:19:54.1 +01:09:53.5 -05:43:55.7 -05:51:00.1 -06:03:25.8 -06:19:57.9 +11:06:45.6 +00:01:42.1 +09:10:42.9 +00:17:38.4 -00:09:40.7 +10:06:40.4 +00:55:17.6 -00:33:28.5 +00:57:49.8 +11:29:36.2 +12:33:03.3 +10:57:41.8

-82.9454 89.6508 -72.6856 -41.3729 -79.0277 -113.742 -44.8684 -57.0742 -1.97117 -489.474 -41.5566 -103.57 157.456 -14.4957 -23.6116 -65.7901 14.6469 11.2966 -49.1174 3.08591 -106.859 26.7674

sdB1VI:He4 sdB6IV:He2 sdO5VI:He7 sdO9VI:He12 sdO8VII:He9 sdB4V:He3 sdO9VI:He8 sdB2VI:He3 sdB2VI:He13 sdO5VI:He38 sdO5VIII:He8 sdO7VII:He10 sdB4V:He3 sdO7VII:He38 sdB0VII:He7 sdO6VII:He9 sdO9VII:He17 sdB4V:He5 sdO5VII:He38 sdO8VII:He9 sdB6IV:He2 sdB2VI:He10

30755 16126 29304 34424 33681 29406 32441 33102 35395 45663 35000 33998 29504 45000 36649 32140 36832 28210 44901 35450 14718 34133

350 217 477 79 230 458 54 231 177 558 210 369 485 100 409 432 199 447 114 179 176 180

4.961 3.854 5.200 5.235 5.173 5.398 5.167 5.450 5.648 5.545 5.086 4.976 5.321 5.905 5.500 4.916 5.805 5.495 6.000 5.595 3.601 5.095

0.061 0.048 0.054 0.024 0.046 0.063 0.031 0.047 0.037 0.047 0.037 0.061 0.059 0.031 0.053 0.061 0.038 0.064 0.039 0.036 0.032 0.017

-3.000 -1.194 -2.384 -1.353 -1.762 -2.442 -1.477 -1.545 -1.330 0.388 -3.000 -2.257 -2.820 1.475 -2.148 -2.154 -1.058 -2.409 1.816 -1.432 -1.630 -1.620

0.434 0.068 0.105 0.029 0.050 0.120 0.026 0.030 0.028 0.023 0.434 0.079 0.287 0.078 0.061 0.062 0.020 0.111 0.170 0.023 0.056 0.036

2.30E+00 2.05E+00 8.30E+00 2.21E+00 1.58E+00 2.16E+00 3.53E+00 1.77E+00 3.50E+00 8.00E+00 2.34E+00 1.96E+00 4.77E+00 1.70E+00 3.58E+00 2.72E+00 5.22E+00 6.76E+00 1.47E+00 1.28E+00 2.40E+00 3.81E+00

On the Automatic Analysis of Stellar Sp ectra

187

continued on next page

188

Table B.1: continued
2

SDSS Identifier

R. A .

Decl.

cz (km s
-1

cz ) 36.462 33.9992 32.5347 35.7679 23.9981 28.335 29.3261 32.2427 30.8319 34.4566 29.1779 31.3337 30.0845 28.428 25.2673 27.5984 32.1839 28.5503

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

J215049.19+010338.4 J215053.84+131650.6 J215227.25+115726.7 J215307.34-071948.4 J215631.56+121237.7 J220403.45+122507.3 J220810.05+115913.9 J221816.78+121400.7 J222238.69+005125.0 J222932.81-004822.5 J223008.26+132734.2 J223839.13+122517.9 J224105.19+141810.2 J231956.10-093937.6 J233914.00+134214.3 J234421.80-101142.8 J234853.52+151215.5 J235108.66+002623.0

21:50:49.19 21:50:53.84 21:52:27.25 21:53:07.34 21:56:31.56 22:04:03.45 22:08:10.05 22:18:16.78 22:22:38.69 22:29:32.81 22:30:08.26 22:38:39.13 22:41:05.19 23:19:56.10 23:39:14.00 23:44:21.80 23:48:53.52 23:51:08.66

+01:03:38.4 +13:16:50.6 +11:57:26.7 -07:19:48.4 +12:12:37.7 +12:25:07.3 +11:59:13.9 +12:14:00.7 +00:51:25.0 -00:48:22.5 +13:27:34.2 +12:25:17.9 +14:18:10.2 -09:39:37.6 +13:42:14.3 -10:11:42.8 +15:12:15.5 +00:26:23.0

34.2033 -108.905 5.51462 -13.1178 -66.6073 -153.325 -69.4664 -74.2256 -114.061 -18.7916 -30.7785 -96.1084 -202.31 16.0919 -411.792 -72.6667 -69.1169 -195.75

sdB1V:He6 sdB1VI:He5 sdB3VI:He5 sdB1VI:He7 sdO7VII:He40 sdB7IV:He1 sdB2VI:He7 sdB7IV:He4 sdO9VII:He11 sdO9VII:He8 sdB5IV:He2 sdB0IV:He6 sdB9III:He0 sdO9VI:He16 sdO5VI:He39 sdB7IV:He2 sdB0VI:He5 sdB0VII:He30

34388 30233 34586 32449 45581 11740 26862 25415 33853 34802 14433 30656 13763 35985 45402 11132 30651 38778

84 248 272 166 220 170 735 605 422 243 235 195 258 299 513 98 167 177

5.351 5.423 5.273 5.683 5.891 3.500 5.000 5.000 4.627 5.328 3.762 5.244 3.501 4.845 5.500 3.217 5.556 5.407

0.023 0.047 0.041 0.037 0.035 0.049 0.076 0.066 0.058 0.044 0.036 0.042 0.052 0.049 0.077 0.026 0.035 0.039

-1.279 -2.222 -1.354 -1.996 1.303 -2.050 -2.150 -1.422 -3.000 -1.360 -2.304 -3.000 -2.669 -0.774 0.607 -0.954 -2.315 -0.288

0.033 0.072 0.029 0.086 0.052 0.244 0.061 0.023 0.434 0.030 0.175 0.434 0.203 0.023 0.030 0.054 0.090 0.007

6.39E+00 1.65E+00

Chapter B - On the Automatic Analysis of Stellar Sp ectra

3.73E+00 1.58E+00 3.56E+00 1.14E+00 3.14E+00 2.25E+00 1.20E+00 1.69E+00 3.47E+00 5.02E+00 1.51E+01 3.23E+00 4.55E+00 7.84E-01 6.37E-01 4.21E+00

App endix C

Results for 83 2MASS-Selected Hot Subdwarf Candidates
Parameters and classifications are listed in this table for the 2MASS-selected stars obtained from E.M. Green (Green et al., 2006). The internal errors of SFIT are given along with the value of 2 for the b est fit.

Table C.1: Results for 83 2MASS-Selected Hot Subdwarf Candidates
2

Identifier (2MASX J-) Balloon 090900004 BD+48 2721 J011407.62+160800.6 J020656.17+143858.6 J021555.50+234314.3 J021619.04+275902.0 J021742.16+280329.5 J022512.51+234820.7 J030725.66+175248.0 J041550.17+015421.0 J042034.85+012041.0 J043037.82-010308.3 J074722.07+622545.2

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

sdO7VII:He11 sdB2VI:He6 sdB8VI:He5 sdB1VII:He7 sdB1VI:He8 sdB1VI:He6 sdO9VII:He8 sdO6VII:He13 sdB2V:He10 sdB0VII:He9 sdO6VII:He38 sdB5VI:He3 sdB3VI:He12

31147 22979 10795 29873 32485 27594 32698 38384 28000 32883 40547 13447 27665

278 240 61 484 57 292 196 119 352 197 120 91 271

4.757 5.267 3.156 5.850 5.623 5.719 5.838 6.000 5.095 5.943 5.117 3.640 5.752

0.054 0.032 0.016 0.046 0.008 0.034 0.033 0.030 0.030 0.035 0.035 0.028 0.031

-1.811 -1.629 -0.368 -1.897 -0.758 -2.100 -1.341 -1.417 -0.701 -1.390 1.301 -0.293 -0.696

0.056 0.018 0.007 0.034 0.010 0.055 0.019 0.023 0.010 0.011 0.017 0.004 0.006

1.77E+00 2.54E+00 1.16E+00 1.70E+00 2.60E+00 1.66E+00 1.34E+00 1.86E+00 2.94E+00 1.38E+00 3.44E+00 8.08E-01 8.00E+00

continued on next page

189

190

Chapter C - On the Automatic Analysis of Stellar Sp ectra
Table C.1: continued
2

Identifier (2MASX J-) J075407.66+651540.2 J075815.66+514348.0 J080245.68+474817.7 J082643.33+330859.2 J082822.23+295131.3 J083127.37+422201.7 J083320.34+202424.8 J083535.58+194412.6 J083734.74+672413.6 J083909.92+182416.6 J084447.93+404426.5 J084535.67+194150.3 J084937.68+234847.3 J085148.86+434402.5 J085649.27+170114.7 J090158.77+395931.3 J091206.53+091621.7 J091706.65+541817.3 J091751.45+615630.1 J092116.62+023741.0 J092246.92+001741.0 J093112.84+051040.4 J093150.58+031848.0 J093426.95+821304.3 J093453.32+841851.5 J093832.18+041343.9 J093935.15+104321.9 J094047.71+185332.9 J094105.31-004755.8 J094107.57+375342.6 J094353.47+783140.7 J094509.99+553450.2 J094637.19+351755.8 J095219.06+441941.9 J095708.88+223055.6 J095854.23+360314.3 J095855.78-044413.9

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

sdB4V:He5 sdB3V:He7 sdB5VII:He2 sdB4V:He7 sdB3V:He7 sdA5VI:He2 sdB4VI:He14 sdB3VI:He6 sdB0V:He17 sdB4V:He5 sdB4V:He6 sdB3VI:He7 sdB3VI:He5 sdB7VI:He4 sdB1VI:He5 sdB6VI:He2 sdB2V:He10 sdB5VI:He2 sdB4V:He5 sdB3VI:He15 sdB4V:He5 sdB5VI:He3 sdB6VI:He1 sdB7IV:He7 sdO7VII:He37 sdB7V:He1 sdB3V:He6 sdB4VI:He5 sdO4VII:He33 sdB3V:He7 sdB2VI:He5 sdB4V:He6 sdB7VI:He3 sdB4V:He7 sdB4V:He5 sdF8VI:He2 sdB7IV:He6

11002 11175 9783 18872 16877 10463 22956 27775 30001 10235 12351 22899 18128 10292 29527 11188 27999 11016 10286 27144 11427 10558 10097 9909 42886 10290 10868 10589 45003 15712 27999 18717 11021 13014 10910 9618 9321

53 56 93 171 137 50 173 429 392 43 59 242 162 49 276 80 442 71 32 227 69 56 62 57 87 71 53 44 121 55 382 165 69 73 45 68 48

2.750 2.706 3.797 4.356 4.122 3.179 5.216 5.784 4.688 2.698 3.321 5.236 4.653 3.139 5.747 3.359 4.689 3.157 2.662 5.295 2.797 3.376 3.303 2.917 5.789 3.306 2.793 2.823 5.312 3.398 5.662 4.251 3.387 3.197 2.714 3.384 2.795

0.020 0.020 0.034 0.034 0.022 0.013 0.026 0.041 0.041 0.017 0.024 0.033 0.029 0.014 0.035 0.026 0.041 0.019 0.014 0.030 0.019 0.018 0.027 0.025 0.030 0.026 0.020 0.014 0.043 0.008 0.036 0.031 0.021 0.025 0.012 0.022 0.021

0.000 0.000 -0.954 -0.827 -0.723 -0.395 -0.407 -1.846 -0.578 -0.079 0.001 -1.309 -1.348 -0.368 -2.111 -0.257 -0.672 -0.122 0.159 -0.300 -0.000 -0.278 -0.645 0.297 1.526 -0.553 0.000 0.308 0.000 -0.597 -2.105 -0.898 -0.140 0.046 0.272 -0.467 -0.368

0.000 0.000 0.093 0.009 0.013 0.015 0.004 0.030 0.009 0.002 0.000 0.009 0.019 0.007 0.056 0.006 0.012 0.002 0.002 0.003 0.000 0.005 0.018 0.012 0.015 0.013 0.000 0.005 0.000 0.008 0.055 0.014 0.005 0.000 0.004 0.019 0.012

1.42E+00 1.67E+00 1.50E+00 2.38E+00 1.60E+00 1.39E+00 6.77E+00 1.64E+00 4.39E+00 1.78E+00 1.29E+00 1.97E+00 2.31E+00 1.55E+00 1.47E+00 1.13E+00 1.86E+00 8.49E-01 2.01E+00 6.12E+00 1.39E+00 1.11E+00 1.34E+00 5.71E+00 1.38E+00 1.64E+00 1.43E+00 1.64E+00 2.53E+00 1.31E+00 1.92E+00 1.65E+00 1.01E+00 1.24E+00 1.45E+00 1.81E+00 3.84E+00

continued on next page

191
Table C.1: continued
2

Identifier (2MASX J-) J095859.91+082504.4 J100058.89+024804.4 J100145.47+375733.2 J100509.89+384615.2 J100607.62+005326.2 J100739.11+202546.7 J104130.43+184209.8 J104653.08+515435.9 J104912.91+380014.9 J111631.06+305838.7 J111719.94+241207.1 J111819.13+093144.4 J112129.35+111917.0 J112832.64+603859.3 J113435.70+664252.6 J113633.63+750653.7 J113837.54+250043.4 J114454.50+031550.2 J122617.00+774312.4 J122745.99+113636.1 J122843.58+282036.6 J123014.92+463720.0 J125049.06+743943.5 J131359.98+183131.3 J132546.78+400827.0 J132546.78+400827.0 J132546.78+400827.0 J132546.78+400827.0 J135515.91+533442.5 J135648.63+210510.1 J140123.40+742150.5 J142127.88+712421.4 J143155.38+172404.9 J145239.03+412618.1 J152653.06+794130.7

Classification

Teff (K)

Teff

log g (cgs)

log g

log(n

He

/nH )

log(n

He

/nH )

sdB6VII:He4 sdB4V:He8 sdB3VI:He3 sdB6VII:He2 sdB8V:He3 sdB5VII:He5 sdB0VII:He8 sdO8VII:He9 sdB2V:He9 sdB4V:He5 sdB4V:He5 sdA2V:He5 sdB4V:He6 sdF5V:He3 sdB4V:He6 sdO9VII:He7 sdB5IV:He4 sdA7V:He4 sdB1VI:He6 sdB6IV:He6 sdB4VI:He5 sdB5V:He6 sdB2VI:He5 sdB6VI:He3 sdB3V:He8 sdB3V:He8 sdB4V:He6 sdB4V:He6 sdB3VI:He12 sdB4V:He5 sdB4V:He7 sdB2VI:He5 sdA7V:He3 sdB7V:He10 sdB0VI:He6

9806 17115 9842 10547 11614 10000 32521 30750 20087 9474 11157 12109 12729 10302 12712 35699 10255 9764 28443 10356 10230 12513 27913 10504 11653 16440 11653 16440 25842 10516 16490 25982 10112 9479 32936

74 148 50 70 83 51 166 262 213 52 73 52 67 52 67 62 43 53 239 40 46 44 436 52 68 74 68 74 231 42 73 319 55 31 67

3.560 4.229 3.190 3.793 3.096 3.748 5.635 4.799 4.130 2.754 2.806 3.182 3.204 3.025 3.131 6.000 2.594 3.118 5.889 2.666 2.844 3.084 5.654 3.505 2.761 4.149 2.761 4.149 5.254 2.609 4.140 5.847 2.881 2.852 5.770

0.013 0.026 0.011 0.030 0.017 0.014 0.030 0.053 0.030 0.021 0.023 0.022 0.024 0.017 0.023 0.016 0.016 0.021 0.035 0.017 0.018 0.018 0.038 0.017 0.023 0.014 0.023 0.014 0.030 0.012 0.014 0.037 0.019 0.021 0.015

-0.319 -0.632 -0.336 -0.700 -0.368 0.065 -1.401 -1.978 -0.718 -0.368 -0.292 -0.131 0.056 -0.307 0.057 -1.672 -0.136 -0.368 -1.996 0.030 -0.083 0.154 -1.993 -0.083 0.122 -0.641 0.122 -0.641 -0.673 0.135 -0.689 -2.346 -0.404 0.439 -2.235

0.015 0.009 0.016 0.021 0.009 0.002 0.022 0.083 0.015 0.012 0.006 0.003 0.000 0.005 0.000 0.041 0.003 0.010 0.043 0.000 0.002 0.003 0.043 0.003 0.002 0.023 0.002 0.023 0.008 0.002 0.026 0.096 0.015 0.005 0.075

1.35E+00 2.76E+00 1.21E+00 9.22E-01 1.27E+00 1.27E+00 1.15E+00 9.67E-01 1.70E+00 1.84E+00 1.10E+00 1.04E+00 1.14E+00 1.53E+00 1.07E+00 1.43E+00 2.43E+00 1.37E+00 2.38E+00 3.30E+00 1.87E+00 1.15E+00 2.28E+00 1.15E+00 1.50E+00 5.48E+00 1.50E+00 5.48E+00 2.54E+00 1.45E+00 1.77E+00 2.57E+00 1.19E+00 5.63E+00 1.69E+00

On the Automatic Analysis of Stellar Sp ectra

App endix D

The Armagh Observatory Cluster
Over the course of this pro ject, Armagh Observatory, as part of the CosmoGrid1 initiative, acquired a dedicated computing cluster which I help ed to set up and administer. The software configuration used by the cluster at the time of writing is documented herein.

D.1

Hardware Configuration

The cluster presently consists of sixteen vertically mounted Blade nodes: one master node, and fifteen slave nodes. Each slave node contains: · Two Intel Xeon 3GHz processors each with 1MB cache · 2GB RAM · One 40GB Maxtor SATA UDMA/133 hard drive · One Broadcom BCM5721 1000Base-T PCI Express NIC The master node has the same basic hardware configuration as p er the slaves except for:
1

http://www.cosmogrid.ie/

193

194

Chapter D - On the Automatic Analysis of Stellar Sp ectra

· Two 240GB Maxtor SATA UDMA/133 hard drives · One CDRW/DVDR drive · One floppy disk drive · Two 1000Base-T network cards All of the nodes are interlinked by one 24 p ort gigabit ethernet switch, and are connected to one 16 p ort KVM unit.

D.2

Software Configuration

System Software

The op erating system used on all of the nodes is currently Red Hat Enterprise Linux AS release 3 (Taroon Up date 3). The following software packages form the core of the cluster setup:

· Condor2 version 6.6.10 · Intel Fortran Compiler Version 8.1 · MPICH 1.2.4 · Ganglia 3.0

User Account Management

User accounts are managed centrally on the master node by editing /etc/passwd and /etc/shadow using the standard account management tools. Once any changes to the
2

http://www.cs.wisc.edu/condor/

D.2 Software Configuration

195

user accounts have b een made, /etc/passwd and /etc/shadow must b e refreshed on all of the slave nodes by using the brcp and brsh commands.

Home Directories

The central partition of user home directories is located on the master node, and is shared out to all the slave nodes using NFS. This creates a single storage domain for the cluster, allowing user jobs running on the slave nodes to read/write data from/to the user's home directory, thus avoiding the need for any b othersome manual file transfer op erations. Each user has a disk space quota of 10GB.

Condor

Condor is a sp ecialised batch system for managing compute-intensive jobs. Like most batch systems, Condor provides a queuing mechanism, scheduling p olicy, priority scheme, and resource classifications. Users submit their compute jobs to Condor, Condor puts the jobs in a queue, runs them, and then informs the user as to the result. A Condor cluster is comprised of a single machine which serves as the central manager, and an arbitrary numb er of other machines that are part of the cluster. Conceptually, the cluster is a collection of resources (machines) and resource requests (jobs). The role of Condor is to match waiting requests with available resources. Every part of Condor sends p eriodic up dates to the central manager, the centralised rep ository of information ab out the state of the cluster. Periodically, the central manager assesses the current state of the cluster and tries to match p ending requests with appropriate resources. The basic Condor setup for the Armagh Observatory cluster nominates the mas-

On the Automatic Analysis of Stellar Sp ectra

196

Chapter D - On the Automatic Analysis of Stellar Sp ectra

ter node as the central manager for the cluster, with the slave nodes functioning as dedicated computing resources. No jobs are p ermitted to run on the master.

Directory Layout And NFS Shares

The Condor software is installed

solely on the master in the directory

/opt/condor-6.6.10. As the name of this directory is dep endent on the version of Condor installed, a symb olic link called /opt/condor points to whatever directory contains the latest version. This symb olic link has b een added to /etc/exports, and the Condor installation directory is shared out to all the slaves over NFS. Condor is set up to require that every node has a directory on its local filesystem to which the Condor daemons can write log information and create temp orary work folders for user jobs. This directory is typically located at /home/condor, however, the central NFS share of home directories from the master does not allow a unique /home/condor for every node. Instead, each slave node has a disk partition called /condorhome which contains the directory /condorhome/condor/ that can b e used by the local Condor daemons. On the master node, /condorhome is a symb olic link p ointing to the /home partition wherein a directory called condor exists.

Boot Script

To ensure the Condor daemons are loaded up when a node is first p owered on, a b oot script named condor is located in /etc/init.d on each node. This b oot script is then sym-linked into the runlevel 3 startup scripts directory, /etc/rc3.d/, as the entry S98condor. The b oot script listing is:

D.2 Software Configuration

197

#! /bin/sh export CONDOR_CONFIG=/opt/condor/etc/condor_config MASTER=/opt/condor/sbin/condor_master PS="/bin/ps auwx" case $1 in 'start') if [ -x $MASTER ]; then echo "Starting up Condor" $MASTER else echo "$MASTER is not executable. exit 1 fi ;;

Skipping Condor startup."

'stop') pid=`$PS | grep condor_master | grep -v grep | if [ -n "$pid" ]; then # send SIGQUIT to the condor_master, which # shutdown method. The master itself will # SIGKILL to all it's children if they're # seconds. echo "Shutting down Condor (fast-shutdown kill -QUIT $pid else echo "Condor not running" fi ;; *) echo "Usage: condor {start|stop}" ;; esac

awk '{print $2}'` initiates its fast start sending not gone in 20 mode)"

User Path Setup

The Condor user commands for submitting a job to the cluster, checking cluster status and job queues, etc., along with their associated manual pages, are located in the /opt/condor subtree. To give users easy access to the commands and man pages, the appropriate shell variables are modified on login by two system-wide shell profile files, condor.sh and condor.csh,located in /etc/profile.d. They also set up the environment for MPICH On the Automatic Analysis of Stellar Sp ectra

198

Chapter D - On the Automatic Analysis of Stellar Sp ectra

and Intel's Fortran compiler. For bash users, condor.sh effects this configuration:

export CONDOR_CONFIG=/opt/condor/etc/condor_config if [ -z "${PATH}" ] then export PATH=/opt/condor/bin:/opt/mpich/bin else export PATH=/opt/condor/bin:/opt/mpich/bin:$PATH fi if [ -z "${MANPATH}" ] then export MANPATH=/opt/condor/man:/opt/mpich/man else export MANPATH=/opt/condor/man:/opt/mpich/man:$MANPATH fi if [ `id -u` = 0 ]; then export PATH=$PATH:/opt/condor/sbin:/opt/mpich/sbin fi ### Set up ifort and idb . /opt/intel_fc_80/bin/ifortvars.sh . /opt/intel_idb_80/bin/idbvars.sh

And condor.csh does the same for tcsh users:

setenv CONDOR_CONFIG /opt/condor/etc/condor_config if !($?PATH) then setenv PATH /opt/condor/bin:/opt/mpich/bin else setenv PATH /opt/condor/bin:/opt/mpich/bin:$PATH endif if !($?MANPATH) then setenv MANPATH /opt/condor/man:/opt/mpich/man else setenv MANPATH /opt/condor/man:/opt/mpich/man:$MANPATH endif ### Set up ifort and idb source /opt/intel_fc_80/bin/ifortvars.csh source /opt/intel_idb_80/bin/idbvars.csh

D.2 Software Configuration

199

Condor Configuration Files

/opt/condor/etc/condor_config is the global Condor configuration file containing settings for everything from basic cluster setup details, to network p ermissions, user policies, flocking, daemon controls, and so on. Most of the setting in this file can b e left at their defaults. However, Part One of the file contains settings that must b e customised for the particular Condor installation at a site. For the Observatory cluster, the settings for Part One are as follows:

CONDOR_HOST RELEASE_DIR LOCAL_DIR LOCAL_CONFIG_FILE

= master = /opt/condor = /condorhome/condor = $(RELEASE_DIR)/etc/$(HOSTNAME).local

REQUIRE_LOCAL_CONFIG_FILE = TRUE CONDOR_ADMIN MAIL UID_DOMAIN FILESYSTEM_DOMAIN = root@master = /usr/bin/mail = arm.ac.uk = $(FULL_HOSTNAME)

Other miscellaneous settings that have b een changed are:

### Only allow ### slave nodes HOSTALLOW_READ HOSTALLOW_WRITE

daemon read/write access to the connected on the LAN. = 192.168.0.* = 192.168.0.*

### Fully qualified names are not used in /etc/hosts ### so Condor likes this set. DEFAULT_DOMAIN_NAME = arm.ac.uk

Each of the nodes in the cluster has its own Condor configuration file in /opt/condor/etc. The master node and the slave nodes are treated differently with the master having its own sp ecific settings, and the slaves all having the same settings. On the Automatic Analysis of Stellar Sp ectra

200

Chapter D - On the Automatic Analysis of Stellar Sp ectra

The master's configuration file, m44.local, contains the following:

### The master never runs jobs START = FALSE ### There are two NICs in the master. This tells ### Condor to use the internal NIC. NETWORK_INTERFACE = 192.168.0.149 COLLECTOR NEGOTIATOR DAEMON_LIST = $(SBIN)/condor_collector = $(SBIN)/condor_negotiator = MASTER, COLLECTOR, STARTD, NEGOTIATOR, SCHEDD

JAVA = /usr/bin/java ### Turn off reporting of pool stats to the ### Condor people CONDOR_DEVELOPERS_COLLECTOR = NONE CONDOR_DEVELOPERS = NONE ### PRIORITY_HALFLIFE = 1 adjust a user's Condor ### priority in real-time. Thus, when their job ### releases any resources, the user's priority ### returns to 0.5 very quickly. PRIORITY_HALFLIFE = 1 ### Turn off any job preemption. No jobs will be ### preempted for any reason. PREEMPTION_REQUIREMENTS = FALSE PREEMPTION_RANK = FALSE

As each slave node has the same configuration, a time-saving device has b een employed wherein any modifications to the slave setup are made in a template file called node.local.template. This file is then copied using a shell script to create all the nodeXX.local files for the slaves. At present, node.local.template contains:

### Dedicated scheduler for running MPI jobs. DedicatedScheduler = "DedicatedScheduler@master" STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler START = SUSPEND = CONTINUE = PREEMPT = KILL = WANT_SUSPEND WANT_VACATE TRUE FALSE TRUE FALSE FALSE = FALSE = FALSE

D.2 Software Configuration
RANK = Scheduler =?= $(DedicatedScheduler)

201

### Tell the daemons not to pay attention to any ### console activity. Prevents their Condor status ### changing to 'Owner' if someone logs in to a ### node to perform maintenance. VIRTUAL_MACHINES_CONNECTED_TO_CONSOLE = 0 VIRTUAL_MACHINES_CONNECTED_TO_KEYBOARD = 0

The shell script which p erforms the copying, refresh.sh, works by slurping the node names from /etc/brshtab, and then copies the template file using a for loop:

#!/bin/bash NODES="`cat /etc/brshtab`" for I in $NODES do cp node.local.template $I.local done

Condor User Policies

Most users of the cluster tend to run large batches of relatively short jobs (of the order of < 1 hour p er job), but some have submitted a small numb er of long-running jobs (of the order of several hours to several days). In general, each job submitted must b e allowed to run to completion without b eing preempted, otherwise the job must start again at the b eginning when it is reallocated to a slave node. For users who submit large batches of short jobs, such preemption is merely troubling. However, for users with long-running jobs, any interruption could mean the serious loss of several days of computation. To ensure fair u se of cluster resources without job preemption, the

PRIORITY_HALFLIFE Condor variable has b een set to equal 1 in the local configuration file for the master node. This allows Condor to adjust a user's priority level almost as soon as their jobs start running. As their jobs b egin to use cluster resources, Condor lowers the user's priority. If someone else submits a batch of jobs to the queue, On the Automatic Analysis of Stellar Sp ectra

202

Chapter D - On the Automatic Analysis of Stellar Sp ectra

their user priority will b e higher than that of the other user. So, as one of the other user's jobs finishes on a node, Condor will then allocate that node to a job b elonging to the user with the highest priority. This will gradually allow Condor to balance out the allocation of resources so that no one user can use all of the resources all of the time. To prevent Condor from preempting currently running jobs if someone with a higher user priority submits jobs to the queue, the configuration variables

PREEMPTION_REQUIREMENTS and PREEMPTION_RANK have b oth b een set to false in the master's local configuration file. Over time, undoubtedly the user p olicy for the cluster will change. Refer to Section 3 of the Condor manual.

D.3

MPICH 1.2.4 RPM Sp ec File

This sp ec file can b e used to build RPM packages from a standard MPICH v1.2.4 tarball. The sp ec file ensures that the current installation of Intel's Fortran compiler is used to build the F77 and F90 bindings, and it produces two RPMs: one standard RPM which contains the MPICH runtime libraries that should b e installed on all the nodes, and a development RPM containing all the MPI compiler wrapp ers which should only be installed on the master node.

Name: License: Group: URL: Version: Release: Summary: Source: BuildRoot: Autoreqprov: %define

mpich Other License(s), see package Development/Libraries/Parallel ftp://ftp.mcs.anl.gov/pub/mpi/old/ 1.2.4 3 A Portable Implementation of MPI mpich-%{version}.tar.gz %{_tmppath}/%{name}-%{version}-build on _mpich_root /opt/mpich

%description MPICH is a freely available, portable implementation of MPI, the Standard for message-passing libraries.

D.3 MPICH 1.2.4 RPM Sp ec File
%package devel Summary: Group: Autoreqprov: Requires: Provides: Obsoletes:

203

A Portable Implementation of MPI Development/Libraries/Parallel on mpich mpich-doc mpich-doc

%description devel MPICH is a freely available, portable implementation of MPI, the Standard for message-passing libraries. %prep %setup -q DIRS=$(find -type d) %build CFLAGS=$RPM_OPT_FLAGS; export CFLAGS; export F90="ifort" ; export FC="ifort" ; export CCFLAGS="-O2"; export FFLAGS="-O2"; export RSHCOMMAND="/opt/condor/sbin/rsh"; sh configure --with-arch=LINUX \ --with-device=ch_p4 \ --with-comm=ch_p4 \ --with-romio \ --with-mpe \ --libdir=$RPM_BUILD_ROOT%{_mpich_root}/%_lib \ --enable-sharedlib \ --enable-c++ \ --enable-f77 \ --enable-f90modules \ --disable-mpedbg \ --disable-devdebug \ --disable-debug \ -prefix=$RPM_BUILD_ROOT%{_mpich_root} \ -c++=/usr/bin/g++ \ -opt=-O2 \ -cc=/usr/bin/gcc \ -fc=/opt/intel_fc_80/bin/ifort \ -f90=/opt/intel_fc_80/bin/ifort \ -f90flags=-O2 \ -optcc=-O2 \ -mpe_opts=-O2 make %install rm -rf $RPM_BUILD_ROOT make install PREFIX=$RPM_BUILD_ROOT%{_mpich_root} \ MPIINSTALL_OPTS="-manpath=$RPM_BUILD_ROOT/%{_mpich_root}/man" \ -libdir=$RPM_BUILD_ROOT/%{_mpich_root}/%_lib find $RPM_BUILD_ROOT%{_mpich_root} -type l -name "mpirun" | \ xargs rm -f

On the Automatic Analysis of Stellar Sp ectra

204

Chapter D - On the Automatic Analysis of Stellar Sp ectra

grep -lr "$RPM_BUILD_ROOT" $RPM_BUILD_ROOT/%{_mpich_root}/ | \ xargs perl -pi -e "s@$RPM_BUILD_ROOT@@g" rm -f examples/perftest/config.cache \ examples/perftest/config.log \ examples/perftest/config.status \ examples/test/config.log \ examples/test/config.status # libs rm $RPM_BUILD_ROOT%{_mpich_root}/%_lib/lib* rm $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared/lib* [ -e lib/libmpich.a ] && cp -f lib/*.a $RPM_BUILD_ROOT%{_mpich_root}/%_lib [ -e lib/*.o ] && cp -f lib/*.o $RPM_BUILD_ROOT%{_mpich_root}/%_lib [ -e lib/*.s* ] && cp -f lib/*.s* $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared for i in libfmpich libmpich libpmpich; do echo Working on $i; cp -f lib/shared/$i.so.1.0 $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared ( cd $RPM_BUILD_ROOT%{_mpich_root}/%_lib/shared; ln -sf $i.so.1.0 $i.so ) done # docs rm -fr $RPM_BUILD_ROOT%{_mpich_root}/www export manpath="$manpath /opt/mpich/man" %clean #rm -rf $RPM_BUILD_ROOT %files %defattr(-,root,root,755) %doc COPYRIGHT %{_mpich_root}/sbin/* %{_mpich_root}/bin/mpirun* %{_mpich_root}/bin/mpiman %{_mpich_root}/bin/mpireconfig %{_mpich_root}/bin/mpireconfig.dat %{_mpich_root}/bin/tarch %{_mpich_root}/bin/tdevice %{_mpich_root}/bin/serv_p4 %{_mpich_root}/%_lib/shared/*.so.* %{_mpich_root}/share/* %{_mpich_root}/man/mandesc %{_mpich_root}/man/man1/*.1* %files devel %defattr(-,root,root,755) %{_mpich_root}/doc/* %{_mpich_root}/examples/* %{_mpich_root}/man/man3/*.3* %{_mpich_root}/man/man4/*.4* %doc COPYRIGHT %{_mpich_root}/include/mpi2c++/*.h %{_mpich_root}/include/f90base/*.mod %{_mpich_root}/include/f90choice/*.mod %{_mpich_root}/include/*.h

D.3 MPICH 1.2.4 RPM Sp ec File
%{_mpich_root}/%_lib/*.a %{_mpich_root}/%_lib/shared/*.so %{_mpich_root}/etc/* %{_mpich_root}/bin/mpicc %{_mpich_root}/bin/mpiCC %{_mpich_root}/bin/mpif77 %{_mpich_root}/bin/mpif90

205

On the Automatic Analysis of Stellar Sp ectra

App endix E

LTE-CODES
LTE-CODES is a package of Fortran programs and supp orting libraries for analysing the sp ectra of hot stars. The main comp onents of the package are:

STERNE computes plane-parallel, line-blanketed model atmospheres for hot stars, Teff > 8000K , in local thermal, radiative, and hydrostatic equilibrium. The code handles extremely H-deficient mixtures and comp osition stratification. SPECTRUM computes synthetic sp ectra, line profiles, equivalent widths, and sp ecific intensities, assuming LTE, from model atmospheres of hot stars, Teff > 8000K . It can handle atmospheres of arbitrary chemical comp osition. SFIT is a general-purp ose code designed to optimise theoretical stellar sp ectra to an observed sp ectrum. The code offers several different parameter optimisation methods, including Levenburg-Marquardt, Amoeba, and Genetic Algorithms. It has also b een designed for b oth single and comp osite (binary) stellar sp ectra.

As part of this thesis, the old build system for these codes (which was based on a series of hand-coded Makefiles) was overhauled and p orted to the GNU Autotools
1

system. GNU Autotools is a suite of tools that assists in making software pro jects
1

http://www.gnu.org/

207

208

Chapter E - On the Automatic Analysis of Stellar Sp ectra

easy to build across many platforms. It offers a flexible environment for automatically configuring and generating Makefiles according to the needs of the software pro ject, and adapting them to suit the sp ecifics of whatever op erating system, compilers, and other system tools are at hand.

E.1

Directory Layout

The hierarchical layout of the LTE-CODES package is straightforward. The top-level directory branches into several sub directories, the most imp ortant of which is src. Within src are two sub directories p ointing towards the source code for all the libraries and apps (i.e., the applications STERNE, SPECTRUM, and SFIT). In summary:

lte-codes-x.x | |-- config |-- include \-- src | |-- libraries | | | |--------------\ | | | | |-- at |-| |-- bb |-| |-- chr |-| |-- dp |-| |-- mth |-| |-- mx |-| |-- nr |-| |-- nr_d |-| |-- op |-| |-- opk2 |-| |-- phot \-| \-- phys | | \-- apps | |-- sfit2 |-- spectrum \-- sterne

prof qub rot rtf sdb stn2 str tap tap95 util xfit

E.2 Build System Organisation

209

E.2

Build System Organisation

The central comp onents of the autotools-based build system are the configure.in file which resides in the top-level directory, and the Makefile.am files which are to b e found one in every directory.

configure.in

configure.in is actually a Bourne shell script which contains a numb er of calls to autoconf and automake macros in order to set up the build environment. The particular language used for the pro ject can b e selected, and sp ecific details such as compiler commands and flags can b e defined. The autoconf macros also allow the programmer to tell the build system to test the underlying op erating system for the existence of particular tools, libraries, and files, and to modify the source files of the pro ject as appropriate. configure.in is processed by autoconf to generate a configure script. When this script is executed, it traverses the build tree and generates all the necessary Makefiles in the correct manner. The contents of configure.in for LTE-CODES 1.4 are as follows:

AC_INIT AC_CONFIG_AUX_DIR(config) AM_INIT_AUTOMAKE(lte-codes, 1.4, "http://www.arm.ac.uk/~csj") AC_SUBST(ac_aux_dir) # Checks for programs. AC_PROG_F77(ifort ifc) AC_PROG_LIBTOOL AC_PROG_MAKE_SET FFLAGS='-I$(top_srcdir)/include -I$(top_srcdir)/include/mod -cm -w -w90 -w95' AC_OUTPUT(Makefile \ src/Makefile \ src/libraries/Makefile \

On the Automatic Analysis of Stellar Sp ectra

210

Chapter E - On the Automatic Analysis of Stellar Sp ectra
src/libraries/at/Makefile \ src/libraries/bb/Makefile \ src/libraries/chr/Makefile \ src/libraries/dp/Makefile \ src/libraries/mth/Makefile \ src/libraries/mx/Makefile \ src/libraries/nr/Makefile \ src/libraries/nr_d/Makefile \ src/libraries/op/Makefile \ src/libraries/opk2/Makefile \ src/libraries/phot/Makefile \ src/libraries/phys/Makefile \ src/libraries/prof/Makefile \ src/libraries/qub/Makefile \ src/libraries/rot/Makefile \ src/libraries/rtf/Makefile \ src/libraries/sdb/Makefile \ src/libraries/stn2/Makefile \ src/libraries/str/Makefile \ src/libraries/tap/Makefile \ src/libraries/tap95/Makefile \ src/libraries/util/Makefile \ src/libraries/xfit/Makefile \ src/apps/Makefile \ src/apps/sfit2/Makefile \ src/apps/spectrum/Makefile \ src/apps/spectrum/data/Makefile src/apps/spectrum/models/Makefil src/apps/spectrum/scripts/Makefi src/apps/sterne/Makefile \ src/apps/sterne/scripts/Makefile src/apps/sterne/utils/Makefile)

\ e\ le \ \

If any modifications are made to configure.in, autoconf must b e invoked on it to effect the changes. A small shell script called bootstrap has b een defined to call autoconf in this instance, and the other autotools utilities, to ensure the entire build system is up dated correctly. bootstrap is defined as:

#!/bin/sh libtoolize --force --copy aclocal -I config automake --add-missing --force-missing --gnu --copy autoconf

In-depth documentation on autoconf can b e found in the manual located at: http: //www.gnu.org/software/autoconf/manual/index.html

E.2 Build System Organisation

211

Makefile.am

Every Makefile.am is processed by automake to produce a Makefile.in file. This is subsequently used by the configure script to create a Makefile at every p oint in the build tree. Typically, each Makefile.am contains a numb er of variable assignments that are used to describ e what source files are to b e compiled, if the sources form a library or a binary, what sub directories lie b eneath the current directory, and so on. In the top-level directory, Makefile.am contains the following:

include $(top_srcdir)/config/am_global_include.mk ## Proces this file with automake to produce Makefile.in SUBDIRS = src # Include bootstrap script and other folders in distribution EXTRA_DIST = bootstrap include test # Include files in config directory in distribution AUX_DIST = $(ac_aux_dir)/config.guess \ $(ac_aux_dir)/config.sub \ $(ac_aux_dir)/install-sh \ $(ac_aux_dir)/ltmain.sh \ $(ac_aux_dir)/missing \ $(ac_aux_dir)/mkinstalldirs \ $(ac_aux_dir)/am_global_include.mk MAINTAINERCLEANFILES = Makefile.in aclocal.m4 configure config-h.in $(AUX_DIST) ## Make sure config directory and files it contains are correctly ## added to distribution by 'make dist' dist-hook: for file in $(AUX_DIST); do \ cp $$file $(distdir)/$$file; \ done

This file is fairly basic, the most significant entry b eing the SUBDIRS variable which sp ecifies what sub directories must b e traversed from here during the build. The rest of the assignments are mostly concerned with telling the build system ab out other files which are part of the pro ject but don't need to b e compiled. The Makefile.am for a program or a library looks like this:

On the Automatic Analysis of Stellar Sp ectra

212

Chapter E - On the Automatic Analysis of Stellar Sp ectra

include $(top_srcdir)/config/am_global_include.mk SUBDIRS = scripts data models bin_PROGRAMS = spectrum spectrum_SOURCES = Spectrum.f spectrum_LDADD = \ ../../libraries/dp/libdp.a \ ../../libraries/qub/libqub.a \ ../../libraries/opk2/libopk2.a \ ../../libraries/op/libop.a \ ../../libraries/tap95/libtap95.a \ ../../libraries/str/libstr.a \ ../../libraries/chr/libchr.a \ ../../libraries/rtf/librtf.a \ ../../libraries/nr/libnr.a \ ../../libraries/nr_d/libnr_d.a \ ../../libraries/mth/libmth.a

Here, the name of the final program is sp ecified along with its source files and libraries up on which it dep ends. The build system takes care to ensure that any such dep endencies are compiled first b efore any attempt is made to compile the current program or library. Further documentation on automake can b e found at http://www.gnu.org/software/ automake/manual/automake.html

E.3

Installation Instructions

To install LTE-CODES from the source tarball as a non-root user to an arbitrary directory: 1. Unpack the archive: tar -xvzf lte-codes-x.x.tar.gz 2. cd lte-codes-x.x 3. ./configure --prefix=/path/to/install 4. make 5. make install 6. Set the shell environment variable LTECODES to p oint to the install location

213