Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.adass.org/adass/proceedings/adass03/reprints/P3-6.pdf
Дата изменения: Sat Aug 28 02:41:08 2004
Дата индексирования: Tue Oct 2 10:03:39 2012
Кодировка:
Astronomical Data Analysis Software and Systems XIII ASP Conference Series, Vol. 314, 2004 F. Ochsenbein, M. Al len, and D. Egret, eds.

Designing a Data Mo del for the Virtual Observatory
Mark Cresitello-Dittmar, Janet DePonte Evans, Ian Evans, Michael Harris, Stephen Lowe, Jonathan McDowell, Arnold Rots Harvard-Smithsonian Center for Astrophysics Abstract. The goal of the Virtual Observatory is to make astronomical data more accessible and to provide the means to more easily analyze that data. Currently, archives hold analogous data in a variety of different representations, which impedes interoperability at the data analysis stage. An important element of the VO is a data model that can unambiguously represent the relationships between data values and physical properties. At the CfA we are developing a data model design that can support the representation, analysis and display of data collected on different types of instruments. This model is a common, high-level framework of general-purpose components for fusion of heterogeneous data sources. From this framework, we have focused on a subset of components required to meet selected science ob jectives on spectral and image data.

1.

Dataset

Here we present elements of an observation data model for the VO. Figure 1 shows the Dataset, the ma jor ob ject for managing data either from empirical observations or from simulations. The shaded boxes indicate the focus of our current modeling efforts at CfA. Starting with section 2, this paper concentrates on the Data Container ob ject which provides access to the data values. The remaining components are described briefly here.
Simulation Observation

uses

Dataset

Generic Mapping

Observation Subset

Provenance

Observatory Location

Rel. Obs. Phase Space Volume & Observable

Data Container ("Image")

Mapping

Feature

Line

Artifact implies Abs. Obs. Phase Space Volume & Observable x Spatial Region

Geometry

Sensitivity

Resolution

Bandpass (Interval) Fixed

Figure 1.

Dataset Model 277

c Copyright 2004 Astronomical Society of the Pacific. All rights reserved.


278

Cresitello-Dittmar et al.

The Relative Observational Phase Space Volume & Observable component specifies the region of physical space being observed (Phase Space, which may have dimensions of space, time, wavelength, etc.) and the quantity being measured (Observable) relative to the observatory location. These values can be translated to an Absolute reference by using data in the Observatory Location. The Mapping component provides the translation from pixel elements to volumes in the phase space. It also specifies the relationship between the pixel values and physical values. The Generic Mapping component provides a framework for organizing standard data transformations. It can be thought of as a library of transformations that may be used to define the specific mappings needed in a dataset. This library includes the usual astronomical spherical pro jections as well as mappings between units, between coordinate systems and between data values that are denoted using interchangeable properties such as frequency and wavelength. 2. Data Container

The Data Container (Fig. 2) addresses the conflicting requirements of permitting arbitrarily irregular instrument structures to be represented while maintaining efficiency for the many common datasets that are highly regular. It provides access to the measurement data and a logical view of its organization. (This may differ from the in-memory layout.) This logical organization is framed by the Index Set, which specifies the indexes or labels that identify the individual data cells. For the many data sets which are naturally laid out as a simple data (hyper) cube, the Index Set would be the usual n-tuples, e.g., (1, 1),. .. , (m, n). The key to handling the conflicting needs is to provide multiple views, at least two access patterns for the data.

uses uses

uses uses

n: integer

Figure 2.

Data Container Model

To support generality, the Data Container methods always allow the list of indexes to be obtained and used to iterate through the data cells. The data value and/or metadata can be obtained for each cell, in essence using heavyweight


Designing a Data Model for the Virtual Observatory

279

ob jects for each data item. A data consumer (i.e., application software) can fall back on this form to process the data if it does not recognize the Index Set 's structure. To support efficiency, the Index Set conforms to one of a small set of archetypal structures such as array, array with bad cell mask, sparse array, or event list. Application software can then be designed to take advantage of the structure to organize processing. Metadata describing the correspondence between the data cells and locations in detector or observational space is represented as a collection of pixel mappings PM1 , PM2 , ... into coordinate spaces DCS1 , DCS2 , ... Similarly, interpretation of the data cell values is handled by a value mappings VM1 , VM2 , ... into coordinate spaces RCS1 , RCS2 , ... Depending on the need, the VMs may depend on the cell location as specified by its index. These mappings are not simply computable functions, but also have the type and parameters of the transformation encoded, such as constant, linear, piecewise linear or tangent pro jection. Thus, the application program can inspect this information to best organize its processing. 3. Three Ways to Data

In our model, there are three ways of accessing the data: · As a list of pixels with no assumptions about contiguity of pixels in physical space or in memory. · By the logical structure which is defined by the Index Set, such as an ndimensional array, which might not be fully rectangular due to missing or invalid cells. The data provider determines this structure. · As chunks of pixels which are rectangular, regular, filled arrays addressable by pixel offsets into contiguous memory. This supports highly efficient access. A simple FITS image would require only a single chunk, mosaics a few chunks, and sparse arrays many chunks. 4. Example: Hubble WFPC2

In the diagram below we show how these elements might be used to represent the data from the Hubble Wide Field and Planetary Camera. The data from the four CCDs can be organized as an 800x800x4 block. Mappings PM1,2,3 describe the layout of the detector panels and the sky layout in two coordinate systems.
x y y x
RA---TAN

DEC--TAN

y k j i Index Organization x y x y x Projected Sky PM2: Instrument x-y PM3: RA-Dec

PM1: CCD Coordinates (Oriented to readout)


280

Cresitello-Dittmar et al.

The Index Set is not constrained to be rectangular. Using this feature, another Data Container can be defined describing just the Wide Field Camera, as shown in the accompanying figure below. This ob ject uses a different Index Set and correspondingly different mappings to access the same data. The data provider (i.e., archive) defines the Data Container(s) and Index Set(s). This gives the provider the flexibility to create an organization natural for its data, while at the same time define alternate views for different audiences or purposes.
x j i y x y Index Organization x y x Projected Sky PM2: Instrument x-y PM3: RA-Dec y
RA---TAN

DEC--TAN

PM1: CCD Coordinates (Oriented to readout)

5.

Example: Fiberoptic Spectrometer

In a fiberoptic spectrometer, 1-D spectra are measured at a number of irregularlyarranged sky positions. As seen in the next figure, the data may be stored as a 2-D array, each row holding the spectrum for a single position. Consequently, each array element maps to a location in the 3-D domain sky в wavelength.

j i Index Organization

fiber PM1: Fiber and wavelength

DEC--TAN


RA---TAN

PM2: Projected Sky/Wavelength

6.

Continuing Effort

Our next steps in moving the data model development forward are: · Complete definition of components for 1-D spectra and for images. · Define XML format for data model components. · Develop software to render data from several archives into XML. We are in the process of developing a prototype system. In addition to data model components, the system includes a network interface module that manages the communication details between clients and SIAP services. Acknowledgments. This material is based on work supported by the National Science Foundation under Grant No. AST-0121296 and under Cooperative Agreement No. AST-0122449. This effort is also supported by the Chandra X-ray Center under NASA contract NAS8-39073.