Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.adass.org/adass/proceedings/adass02/reprints/P1-1.pdf
Дата изменения: Wed Mar 12 01:41:41 2003
Дата индексирования: Tue Oct 2 09:45:57 2012
Кодировка:
Astronomical Data Analysis Software and Systems XII ASP Conference Series, Vol. 295, 2003 H. E. Payne, R. I. Jedrzejewski, and R. N. Hook, eds.

Metadata for the VO: The Case of UCDs
Sґ stien Derriere, Francois Ochsenbein, Thomas Boch eba ё CDS, Observatoire Astronomique de Strasbourg, 11 rue de l'Universitґ e, F-67000 Strasbourg, France Guy T. Rixon Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge. CB3 0HA, U.K. Abstract. The UCDs (Unified Content Descriptors) were first developed in the ESO/CDS data mining pro ject, to describe precisely the contents of the individual fields (columns) of tables available from a data center. They have been used to describe the content of the 105 columns available in the different VizieR tables. Owing to the wide diversity and high heterogeneity of table contents, UCDs constitute an excellent starting point for a hierarchical description of astronomy, for general data mining purposes. We present different applications of UCDs: selection of catalogues, based on their content; identification of catalogues having similar fields; automated data conversion allowing direct comparison of data in cross-identifications. The compatibility of UCDs with semantic descriptions developed in other contexts (data models for space-time coordinates or image datasets) will also be addressed.

1.

Introduction

Astronomical tables can come from many different sources, and the original descriptions are therefore very heterogeneous. Automated processing of the contents of these datasets, which is one of the Virtual Observatory (VO) applications, requires a uniform description for the catalogues (with standardized metadata). The UCDs (Unified Content Descriptors), first developed in the ESO/CDS data mining pro ject (Ortiz et al. 1999), are metadata describing precisely the contents of the individual fields (columns) of tables available from a data center. They have been applied to describe the content of the 105 columns available in the different VizieR tables (Ochsenbein, Bauer & Marcout 2000). Some tools using UCDs have been developed and are available online: http://vizier.u-strasbg.fr/UCD/.

69 c Copyright 2003 Astronomical Society of the Pacific. All rights reserved.


70

Derriere, Ochsenbein, Boch & Rixon

Figure 1. The UCD browser, on the left, is used to locate relevant UCDs in the hierarchical structure. For each UCD, the list of VizieR catalogues containing this UCD in at least one field can be displayed and queried. 2. 2.1. Usage of UCDs Browsing the UCD Tree

The UCDs consist of a 4-level hierarchical structure, with approximately 1500 elements. Different branches of the tree correspond to different domains of the semantic classification (e.g., time, position, instrument). A tool has been developed to visualize and explore the tree (Figure 1). A javascript and an applet version of the browser are available. The presentation of the tree is similar to a file system browsing engine, with folders being nodes of the UCD tree and documents being the UCD leaves, actually describing the catalogue columns. Clicking on a leaf gives access to: · a definition of the corresponding UCD; · statistics on column labels and units associated to this UCD (Figure 2); · usage statistics for this UCD in VizieR (catalogues and tables where it occurs). 2.2. Data Validation

The wide heterogeneity of the original description of astronomical data is clearly visible when making statistics on the column names and units used to represent a single physical quantity (Figure 2). These statistics help pointing out possible errors in the catalogue description, or in the UCD assignation, and are thus useful for data validation.


Metadata for the VO: The Case of UCDs

71

Figure 2. Example of statistics on the different column names and units used in all VizieR tables for one UCD. 2.3. Selection of Catalogues

One of the most important use of UCDs is that they allow to select catalogues which exactly contain a given measurement. Instead of searching all the "infrared" catalogues for a K-band magnitude, all catalogues with a Johnson K magnitude can be retrieved instantly. This selection can be done with the browser (see Figure 1). It is also possible to translate plain text into relevant UCDs. One provides one or several terms to describe in natural language the desired quantity (e.g., `proper motion'). The answer is a list of corresponding UCDs, tentatively ordered by relevance. These can be used to select the relevant catalogues. 2.4. Automated Data Conversion

If two fields in two tables are described by the same UCD, these fields can be compared because they contain the same quantity. Automated data conversion can then be applied if these fields are expressed in different units (Figure 3). 2.5. Finding Similar Catalogues

Because UCDs precisely describe the contents of catalogues, they can be used to find similar catalogues. Given a reference catalogue, the list of UCDs which are present in this catalogue is used as criteria to perform a search among all other catalogues: similar catalogues are those that will have many UCDs in common with the reference one.


72

Derriere, Ochsenbein, Boch & Rixon

Figure 3. Example of automated conversion for columns with the same UCD. 3. Possible Evolution of UCDs

Suggestions have been made to improve the current structure of UCDs. The evolution towards an "atomic" rather than hierarchical structure is studied. UCDs could be built by assembling atomic elements (principal nouns, adjectives, complementary nouns) selected among a predefined set of standard atoms. This scheme allows more flexibility in defining new UCDs, avoids dispersion of related quantities in different branches of the tree, and describes the data more completely. Examples of combinations of atoms (compared to current UCDs): · angle/declination (current UCD is POS EQ DEC); · length/wavelength/johnson-V (central wavelength of the band, no UCD); · length/wavelength/extent/johnson-V (bandwidth of the band); · energy-flux-density/uncertainty/johnson-V (current UCD is ERROR). 4. Conclusions

UCDs are currently used in VizieR to describe the semantics of astronomical content. They offer new ways of selecting relevant datasets, and enable cross catalogue/archive interoperability. Owing to the wide diversity of table contents, UCDs constitute an excellent starting point for a hierarchical description of astronomy, for general data mining purposes. An improved structure relying, for example on atomic keywords, could provide building blocks for the development of astronomical ontologies. References Ortiz, P. et al. 1999, in ASP Conf. Ser., Vol. 172, Astronomical Data Analysis Software and Systems VIII, ed. D. M. Mehringer, R. L. Plante, & D. A. Roberts (San Francisco: ASP), 379 Ochsenbein, F., Bauer, P. & Marcout, J. 2000, A&AS, 143, 23