Документ взят из кэша поисковой машины. Адрес оригинального документа : http://star.arm.ac.uk/~cwr/progress_2003.pdf
Дата изменения: Fri Aug 29 20:33:39 2003
Дата индексирования: Tue Oct 2 00:27:19 2012
Кодировка:

Поисковые слова: redshift survey
Automated Sp ectral Analysis

by Christopher Winter

Armagh Observatory and Queen's University of Belfast

April, 2003


Contents
1 Intr 1.1 1.2 1.3 o duction The Astronomical Data Explosion . . . . . . . . . . . . . . . . . . . . The Data Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . The Need For Automation . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 5 6 6 7 8 9 10 12 13 13 16 18 18 18 19 20 21 22 22 22 23 24 24 24 24 25 25 26 26 27 27 28

2 Automated Techniques 2.1 Classification and Parameterization . 2.2 Minimum Distance Methods . . . . . 2.3 Artificial Neural Networks . . . . . . 2.3.1 The Perceptron . . . . . . . . 2.3.2 The Multi-Layer Perceptron . 2.3.3 Advantages and Disadvantages 2.3.4 Use of ANNs in Astronomy . 2.4 Principal Component Analysis . . . . 3 Application A Grid Clustering Systems A.1 Preface . . . . . . . . . . . . . A.2 Introduction . . . . . . . . . . A.2.1 Clusters . . . . . . . . A.2.2 Requirements . . . . . A.2.3 Prospective Systems . A.3 Sun Grid Engine . . . . . . . A.3.1 Usability . . . . . . . . A.3.2 Flexibility . . . . . . . A.3.3 Expandability . . . . . A.3.4 Grid-Awareness . . . . A.3.5 Programming Models . A.3.6 Security . . . . . . . . A.3.7 Documentation . . . . A.3.8 Hardware Requirements A.3.9 More Information . . . A.4 Condor . . . . . . . . . . . . . A.4.1 Usability . . . . . . . . A.4.2 Flexibility . . . . . . . A.4.3 Expandability . . . . . A.4.4 Grid-Awareness . . . .

.. .. .. .. .. of .. ..

... ... ... ... ... ANNs ... ...

. . . . .

. . . . . . .. ..

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . 1

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .


A.4.5 Programming Models . A.4.6 Security . . . . . . . . A.4.7 Documentation . . . . A.4.8 Hardware Requirements A.4.9 More Information . . . A.5 Conclusion . . . . . . . . . . . A.5.1 Future Directions . . .

.... .... .... ... .... .... ....

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

28 29 29 29 29 30 30

2


Chapter 1 Introduction
Spectroscopy is one of the key elements in astronomy that helps us learn more about the universe. By examining light from distant astronomical ob jects in this manner, parts of the cosmic picture begin to unfold as we begin to learn more about the heavenly bodies, their chemical compositions, temperatures, and behaviour. But, what used to be a rather painstaking task is beginning to witness a veritable revolution in observational methods, with new technologies able to gather enormous quantities of spectral information in relatively short amounts of time. With this accelerated data acquisition comes a demand for new analytical processes and techniques to quickly distill new discoveries and science from a sea of information. Concepts need to be borrowed from other scientific disciplines, such as statistics and computer science, and applied to the realm of astrophysical problems in this new era of information overload. As a member of the hot star group at Armagh Observatory, it is the ob jective of my research to examine this issue in relation to the spectral analysis of hot stars, and build on the work tentatively begun by others in the field to help stellar astrophysicists make the best use of all the data being collected now, and in the future.

1.1

The Astronomical Data Explosion

Astronomy is undergoing a metamorphosis, concentrated mainly in the domain of observational methods and data analysis. The days of small teams of astronomers using rare and expensive telescope time to gather small amounts of data are being eclipsed by automated, ground-based digital sky surveys, and advanced space-based observatories capable of observing several hundred ob jects at once, over a range of wavelengths, amassing large quantities of multi-parameter data for many millions, or even billions, of ob jects in total. This data explosion is being fuelled by great technological advances in observational equipment, with large CCD arrays of high quantum efficiency able to produce good data from even modest-sized telescopes, and the exponential increase in computer processing power and storage capabilities which facilitate the acquisition and analysis of data quantities in the Terabyte and Petabyte ranges. Metacomputing initiatives such as the Grid plan to bring such computing power to the desktops of scientists in an on-demand, commodity-like manner thus making the analysis and visualization of huge databases possible for research institutes far from the cutting edge in terms of funding and access to high performance 3


computing equipment. Still, there is a problem: what are astronomers supposed to do with all this data? It is one thing having the computing power to store and process it, but entirely another to have analytical techniques that will help astronomers to extract and analyse the most interesting and important facts from an overwhelming mass of unorganized information.

1.2

The Data Generators

Without doubt, ground-based digital sky surveys are the greatest implementors of these advances in technology, and are, subsequently, the main driving forces in this new area of astronomy, presently capable of generating tens and hundreds of Terabytes of raw data over the many year lifespan of a typical survey. However, space-based surveys are also capable of gathering large quantities of very high quality data over diverse regions of the EM spectrum, given their position above the Earth's atmosphere. Presently, there exist several data sets produced by such surveys that are of interest to stellar astrophysicists, particularly those involved with spectroscopic studies. The Sloan Digital Sky Survey (SDSS)1 : SDSS (see, for example, Gunn & Knapp [6]) aims to produce the first CCD photometric survey of the North Galactic hemisphere, covering about one-fourth of the entire sky. The approximately 100 million cataloged sources will then be used to carry out the largest ever spectroscopic survey of galaxies, quasars, and stars. The total raw data produced by the SDSS is expected to exceed 40 Terabytes, with a processed subset of around 1 Terabyte in size consisting of 1 million spectra, positions, and image parameters for over 100 million ob jects, including a mini-image centred on each ob ject in every color. The Two-degree Field (2dF)2 : Commencing operations in 1997, the AngloAustralian Telescope's 2dF facility (see, for example, Lewis et al. [8]) provides multiple ob ject spectroscopy over a two degree field of view. Primarily designed for galaxy and quasar redshift surveys, 2dF allows these surveys to sample up to 400 ob jects simultaneously, and obtain spectra at the rate of 2500 ob jects per night. The spectroscopic data collected by the 2dF Galaxy Redshift survey (2dFGRS)3 , and interlinked 2dF QSO Redshift Survey (2dFQZ)4 , present interesting opportunities as many stellar ob jects will be observed indirectly by these pro jects. The International Ultraviolet Explorer (IUE)5 : The IUE satellite, a joint project between NASA, ESA and PPARC, was the longest and most productive astronomical space observatory, operating for almost 19 years, producing a large collection of spectroscopic data containing around 104,000 spectra of approximately 9600 astronomical sources from all classes of celestial ob jects in the 1150-3350 ° UV band. These spectra have been reprocessed as part of the IUE A
3 4

http://www.mso.anu.edu.au/2dFGRS/ http://www.2dfquasar.org/

4


Final Archive (IUEFA) pro ject and are now freely accessible by the astronomical community. Many new discoveries and investigations are still possible with these data. Forthcoming ground and space-based pro jects, such as GAIA, DIVA, and FAME, will produce even more data compared to the above surveys, certainly into the hundreds of Terabytes range, perhaps eventually reaching the Petabyte barrier as the technology improves to allow increased data quality, scope, and exploration of the time domain..

1.3

The Need For Automation

It is essential that astronomers adopt new research methods if they ever hope to extract useful information from this sea of data, lest the rising tide sweep above their heads. Traditional techniques for astronomical data analysis are unable to cope with the sudden increase in data volume - they are mainly tailored for small data sets of a few tens or hundreds of items and make use of limited analytical techniques such that scaling current software up to work on millions of items would result in an inefficient and inflexible scheme. The requirement is for an array of fast, flexible, and extensible tools that will provide a data analytical kit able to generalise across any kind of data set and enable astronomers to search for particular information in a highly automated and robust fashion.

5


Chapter 2 Automated Techniques
2.1 Classification and Parameterization

Generally speaking, classification is concerned with organising and grouping together ob jects with similar characteristics into discrete classes, and assigning to each class a particular name or designation. It can be considered a mapping from the continuous domain to the discrete domain. Ideally, the discrete classes produced by the mapping reflect some underlying physical relationship present between the differing ob jects, but the classification procedure itself is often dependent on many things, such as what features of each ob ject we measure, how many classes we wish to produce, and what scheme we adopt to perform the actual discrimination between each ob ject and the corresponding class to which it belongs. In terms of stellar astrophysics, a commonly used classification system for stars is the MK system (Morgan, Keenan & Kellman [9]) which aims to classify bright main sequence and giant branch stars in terms of a spectral type and luminosity class. We find that the resulting classifications have a meaningful relationship with the underlying physical principles as it can be shown that spectral type and luminosity class are closely related to effective temperature (Tef f ) and logarithm of the surface gravity (log g ) respectively. However, as stellar ob jects possess many more measurable physical attributes, this two- parameter system is rather coarse, and cannot account well for ob jects not usefully describable by these two parameters, but such problems are inherent in any classification scheme as one must draw the bottom line somewhere in order to define the mapping into the discrete domain. Whereas classification attempts to produce new, extrinsic information regarding a set of ob jects, parameterization combines measurements and a priori knowledge of how an ob ject behaves in order to make more detailed deductions about its unknown intrinsic parameters. In this way, parameterization depends greatly on the accuracy of one's knowledge regarding the ob ject's behaviour, and how well one is able to measure the visible attributes from which the new information is to be derived. To illustrate, a simple example would be finding the volume of a sphere. Here, the volume is the new intrinsic parameter we wish to derive from some measurement of the sphere's diameter or circumference. The a priori knowledge we have of the sphere is the formula to calculate volume from radius, but the whole calculation depends on how accurately we are able to measure the diameter or circumference from which the radius can be derived. 6


Astrophysically speaking, our ob jective is to learn more about stellar ob jects from the measurement of their spectra, which are a unique manifestation of unobservable intrinsic properties. It is possible to deduce information regarding atmospheric composition, temperature, gravity, mass, microturbulent and rotational velocities, whether the star is pulsating, etc., but these are wholly dependent on our knowledge of the underlying physical processes at work in stars, and our ability to obtain high resolution spectra with large signal-to-noise ratios. The process of extracting latent information from spectra necessitates the computation of atmospheric models for a range of physical parameters, and approximations of the spectra they might produce. Observed spectra are then compared with these model spectra to find which permutation of initial physical parameters yields the closest match - essentially a multi-parameter optimization problem. However, the complexity of the physics involved is a ma jor limiting factor in the final results as many simplifications are necessary in order to perform the model calculations in a reasonable amount of time. Despite advances in computational hardware, some of the more advanced non-LTE simulation codes still require days of processing time to produce a model of a single atmosphere. In the realm of digital sky surveys, the process of fitting the observed spectra onto a set of models has itself become a computational monster, a relatively small optimization problem that must now be solved millions of times over. Thus, the technique used to find the fitting solution is crucial, and must be able to work quickly and robustly in a highly automated fashion.

2.2

Minimum Distance Metho ds

The fitting schemes mainly used by stellar astrophysicists to date are minimum-distance methods (MDM - also known as metric distance minimization ), where an unknown spectrum is compared with each member of a set of model spectra, whose parameters are already known, in an attempt to find the closest match by minimising some statistical distance metric. Spectral fitting codes, such as SFIT2 (Jeffery et al. [7]), have tended to make use of the 2 -minimization variant of MDM, where one calculates for all model spectra, 2 =


(T - S )2 , 2

looking for the minimum result, where S is the spectrum we wish to fit, T is a spectrum from the set of models, and is the error in S (photon noise, calibration errors, etc.). For multi-parameter problems, the number of model spectra required to yield a good fit increases exponentially with the number of parameters, thus some interpolation mechanism is usually incorporated into the minimization scheme so that the set of model spectra does not need to cover so many variations for each parameter. Interpolation in the model space, however, requires assumptions about the continuity and smoothness between neighbouring models which may not be valid. MDM schemes, such as 2 -minimization, are popular because they are reasonably simple and robust, give accurate results, are flexible and able to work across different 7


types of models in virtually any range of parameters, and they can easily be kept up to date with advances in model physics and simulation. Perhaps one of the down sides to their operation is the necessity of having to revisit every model spectra each time a new real spectrum is to be parameterised. This makes MDM schemes relatively slow to apply in comparison to other techniques, such as artificial neural networks (discussed in the next section), which undergo a training process on the model spectra, storing the information learned in a set of weights, thereby obviating any further need for the models. Thus, the use of MDM schemes on large databases of spectra may not be the best choice if speed is an initial priority over accuracy. It may be wiser to make a short-term speed/accuracy trade-off during the initial processing of large quantities of unknown spectra, leaving the use of MDM schemes for later, more accurate analysis. Such a decision cannot be made until it is known how well MDM schemes compare in both accuracy and speed to other techniques. A comparison made by Gulati et al. [5] of 2 -minimization and artificial neural networks, applied to the problem of classifying 158 test spectra into 55 spectral types, showed that the two techniques had a very high agreement with catalog classifications, with a correlation coefficient of around 0.993. In terms of performance, they found that 2 -minimization required around 90 minutes of processing time on a SUN SPARC-10 computer for classifying the 158 spectra, whereas the artificial neural network took about 12 hours to converge on the same computer during training, but, once complete, the classification of the test spectra was performed in, typically, less that 1 minute. In terms of accuracy, the results are very compelling, but the speed discrepancy between these two techniques certainly warrants further investigation, and on more recent hardware, to determine how an increase by at least a factor of 3 in the set of test spectra affects this issue.

2.3

Artificial Neural Networks

In the last section, we briefly mentioned artificial neural networks (ANNs). But, what exactly are they, and what can they do? It is easiest to describe an ANN as a statistical pattern recognition algorithm which performs a non-linear, parameterised mapping between two domains. The algorithm is usually presented with an input vector, x, and subsequently yields an output vector, y. The idea for this technique was originally inspired by study of the neuronal cells and structure found in the brain. It was found that each cell, or neuron, has a series of input units, known as dendrites, which allow the neuron to receive signals from other neurons. These inputs are combined in some manner, and a non-linear operation is performed on the result. This result is then output from the neuron along a structure called an axon, and transmitted to the dendrites of other neurons via synapses, which are a connecting mechanism between the axon and the dendrite, and control the strength of the signal on the dendrite.

8


Figure 2.1: Human neuron structure. (Taken from Bailer-Jones [1].)

2.3.1

The Perceptron

A mathematical analogy to the neuron was devised, called the perceptron (see Rosenblatt [11]). Here, a series of input values are multiplied by a set of weights, determining the importance of each input, with the resulting values combined usually by addition. The perceptron's output is determined by applying some activation function to the result of this addition. At its most basic, this activation function is the Heaviside step function which defines a threshold value for the perceptron's output, either 1 or 0. A continuous output between 1 and 0 can be provided by using the logistic sigmoid function.

Figure 2.2: The perceptron mathematical device.

9


We can denote this mathematically as u = x1 в w1 + x2 в w2 + · · · + xn в wn , or, using summation notation,
n

u=
i=0

xi wi ,

and then the Heaviside step activation function, if u > 0 then y = 1 else y = 0 , or the logistic sigmoid activation function, y= 1 . 1 + e-u

This rather simple mathematical device can be taught to solve basic logical operations like AND and OR, it can also perform basic pattern classification. As the operation of the perceptron is fully determined by the weights, a training procedure must be applied so that the weights take on the correct values necessary to solve the problem at hand. This training procedure normally takes the form of a set of test data for the particular problem, i.e. a collection of possible input vectors to the perceptron and the desired output that should be produced in each case, which are then applied to the perceptron and the weights adjusted according to some learning algorithm such that the perceptron's output matches the desired output for each input vector. The most common learning algorithm is to adjust the weights according to the difference between the desired output and the actual output, which can be written formally as,
new ol wi = wi d + в xi (D - y ) , new ol where wi is the adjusted value for weight i, wi d is the current value for weight i, is the learning rate, D is the desired output for the current input, and y is the perceptron's actual output for the current input. Because the training procedure attempts to "teach" the perceptron to produce a desired output through a learning process guided by training data, we say that the training procedure is supervised. Once training is complete, the weights of the perceptron are fixed, and it can now be applied to unknown inputs. One ma jor limitation to the perceptron is that the training procedure will not be able to converge on a solution for the set of weights if the input data is not linearly separable, in other words, the perceptron cannot express non-linear relationships between the input data. A simple example of this problem comes in the form of the exclusive-OR (XOR) function, where there exists no linear boundary able to separate the input values into two classes (see Figure 2.3).

2.3.2

The Multi-Layer Perceptron

This limitation of the perceptron severely restricts its applicability, and the range of functions which it can represent. A solution, which allows for much more general, 10


Figure 2.3: The perceptron XOR problem. The inputs, forming two different classes, cannot be separated by the linear boundary of a single perceptron.

Figure 2.4: The multi-layer perceptron. Each layer is comprised of a series of perceptrons, with inter-linking weighted connections that parameterize a mapping between the input vector and output vector. (Adapted from Bailer-Jones [1].)

11


non-linear mappings, is to create an interlinked, layered, hierarchical structure of many perceptrons. This structure, known as a multi-layer perceptron, consists of an input layer of perceptrons, one or more hidden layers, and, finally, an output layer. The input vector is applied to the input layer, and the output of each perceptron in that layer is fed into one or more perceptrons in the subsequent layer. This process is repeated from layer to layer until the layer of output perceptrons is reached. The direction of propagation of perceptron output normally occurs in a one-way fashion, from the input layer, through the hidden layers, to the output layer, and there exist no feedback loops anywhere in the network. Such networks are called feed-forward, and have the unique property that the outputs can be expressed as deterministic functions of the inputs, and so the whole network represents a multivariate non-linear functional mapping. It has been shown that a multi-layer perceptron consisting of two processing layers, i.e. an input layer, one hidden layer, and an output layer in which no processing is performed, using sigmoidal activation functions, can approximate any continuous function to arbitrary accuracy (see Bishop [3], and references therein). However, because a very large number of perceptrons in the hidden layer is usually required to achieve this, it is more practical to use a three layer network which can achieve a similar level of function approximation with far fewer perceptrons. Training of multi-layer perceptrons is somewhat more complex than that of a single perceptron, given that we now have many layers of weights that need adjusting. The most commonly used scheme is the back-propagation algorithm where an error function of the network weights is chosen that allows the error values from the output perceptrons to be propagated back through the network, and the weights at each layer to be adjusted, by way of the chain rule of differential calculus, such that the value of the error function is minimised. Thus, it is necessary to make use of the sigmoidal activation function, or any other function, such as tanh, that is continuous and differentiable.

2.3.3

Advantages and Disadvantages of ANNs

The ability of neural networks to generalise and map an arbitrary non-linear function, that may concern many parameters, renders them a very powerful technique for pattern matching problems, such as the parameterization of stellar spectra. As such, they have found much application in very disparate areas, from speech and image recognition, to adaptive optics systems in telescopes. As mentioned in section 2.2, once the training procedure for the neural network is complete, the `know ledge ' found by the network in the training data is stored in the network's set of weights. This means that the training data are no longer need when the network is applied to unseen data, and also that the application runs are very short as each training datum does not need to be re-examined each time, as is the case with MDM methods. One of the main difficulties with neural networks lies in the fact that the inputoutput mapping performed by the network depends upon the number of weights. If there are too few weights, then the mapping will be too simple and under-fit the relationship we are trying to match. If there are too many weights, the mapping 12


produces a poor generalization of the relationship, over-fitting it. With any regression problem, a trade-off exists between having sufficient weights to address the complexity of the problem, and having a sufficiently small number of weights such that mapping does not over-fit the relationship. As it is not generally possible to know before-hand the degree of complexity required to replicate the mapping that `truthfully' matches the real relationship, the network structure can often only be found by trial and error. Problems involving complex data sets with high dimensionality subsequently require very complex neural networks with a large number of weights. Such networks take a long time to train, and are at risk of converging on a local minimum in the error space defined by the training procedure's error function. However, as neural networks are inherently parallel structures, they can benefit greatly from the use of parallel distributed processing. With the advent of Grid computing, it will be possible to build and operate much larger networks than are currently possible on desktop workstations.

2.3.4

Use of ANNs in Astronomy

Neural networks have found much application in astronomy, such as star-galaxy classification [10], determining the fraction of binaries in star clusters [12], and the classification of galaxy spectra [4]. With particular respect to stellar astrophysics, several attempts have been made to apply ANNs to both the classification and parameterization of stellar spectra, nearly all of which record high accuracies and performance on the part of the ANNs Bailer-Jones [1] used a combination of neural networks and principal component analysis to automate MK classification of over 5,000 spectra obtained from the Michigan Sky Survey, showing that the networks yielded correct luminosity classes for over 95% of both dwarfs and giants with a high degree of confidence. von Hippel et al. [14] created an artificial expert system for MK stellar temperature classification using a simple neural network, and a rather limited set of 575 humanclassified spectra. They found that the network was able to provide high-quality temperature classification for spectral subtypes from B3 to M4, to within 1.7 spectral subtypes. Singh et al. [13] used principal components analysis along with a number of neural networks to classify a set of 158 spectra of O to M stars having near-Solar composition. They found that neural networks of various configurations were able to provide varying degrees of accuracy, thus showing network architecture can determine the network's performance. The best network was able to provide classifications with an accuracy of two spectral subclasses. Bailer-Jones [2] used a committee of neural networks trained on synthetic spectra to obtain Tef f , log g , and [M/H] parameters from very low resolution spectra (50-100 ° FWHM) to an accuracy of 1% for Tef f , 0.2 dex for [M/H], and ±0.2 dex log g for A stars earlier than solar.

2.4

Principal Comp onent Analysis

Principal component analysis (PCA) provides a means to calculate a new set of vectors, or principal components, that best represent the variance contained in an initial 13


Figure 2.5: Principal component analysis. u1 is the first principal component and the axis onto which the pro jected positions of the data have their maximum sum. u2 is the second principal component, and u1 · u2 = 0. data set. For an n в m data array, the set of vectors produced constitute a new n в p array where p m, thus providing a reduction in the dimensionality of the original data set. The principal components form what is essentially a set of optimally aligned axes for the data, each representing a correlation between the variables, in a decreasing order of importance. A successful derivation of the principal components means that the few p components account for most of the variation in the original data. In terms of stellar spectra, the most significant principal components contain those features which are most strongly correlated in many of the spectra. It therefore follows that noise, which is uncorrelated with any other features by definition, will be represented in the less significant components. Thus, by retaining only the more significant components to repre