Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.adass.org/adass/proceedings/adass03/reprints/P2-1.pdf
Дата изменения: Tue Aug 31 00:14:49 2004
Дата индексирования: Tue Oct 2 10:01:22 2012
Кодировка:

Поисковые слова: п п п п п п п п п п п п п п п
Astronomical Data Analysis Software and Systems XIII ASP Conference Series, Vol. 314, 2004 F. Ochsenbein, M. Al len, and D. Egret, eds.

SAADA: An Automatic Archival System for Astronomical Data
Nguyen N.H., Michel L., Motch C. Observatoire Astronomique de Strasbourg Abstract. This papier presents an overview of SAADA, a to ol designed to allow astronomers to easily create their own databases from archival files (images, sp ectra, tables, ...) or from imp orted data. It aims to make the pro cess of database creation as automatic as p ossible. Its functionality will include java co de generation, data loading, automatic web interfacing, and some interop erability features. Correlation links b etween records can easily b e set up by astronomers in order to add scientific content to the database. Data can either b e accessed with the automatic Web interface or by handling p ersistent ob jects. Through an API, SAADA will b e able to interop erate with external databases (using of VO standards). It will also b e able to achieve queries including constraints on correlation patterns.

1.

Intro duction

The increasing capabilities of b oth hardware and networking offer the p ossibility to astronomers to easily organize their own data in lo cal databases. Nevertheless, setting up such databases remains difficult, esp ecially for the complex and heterogeneous data used in astronomy. The presented development, SAADA, (Syst` eme Automatique d'Archivage de Donnґ Astronomiques in French, Hapees piness in Arabic) aims at making the deployment of lo cal databases easier. SAADA is not a database system but a database generator. Databases created by SAADA will b e hereafter refered as SAADA-DBs. All SAADA-DBs rest on the same branches of a common data mo del, but have their own ob ject layers (API and Web interface) and their own relational bases. The architecture of the SAADA-DBs tries to take advantage of b oth relational and ob ject worlds. Ob ject mo del is convenient to deal with heterogeneous data and to provide a simple API whereas relational database mo del is a mature solution to share large sets of information and to manage concurrency, transactions and roll-backs. A SAADA-DB is an ob ject layer using a SQL RDBMS as rep ository. The goal of SAADA is to create a database system ready to use (a SAADA-DB) just by analysing input data and by applying some rules given by the data owner. SAADA relies on freeware, standard compliant, multiplatform and ob ject oriented programming. SAADA is totally written in Java using numb er of public APIs (J2EE- Flanagan 1999). The purp ose of this pap er is more to explain the architecture of the SAADA-DBs than to describ e the structure of SAADA itself (Figure 1). 121 c Copyright 2004 Astronomical So ciety of the Pacific. All rights reserved.


122

Nguyen et al.

Figure 1.

SAADA

2.

Features of a SAADA-DB

A SAADA-DB can host tables, images, sp ectras and time series. Data are group ed in named collections (e.g. STARS, GALAXIES) set up by the owner. Once a new SAADA-DB is created and its data set loaded, it can b e accessed through a web interface using servlets. The web interface provides browsing facilities and an editor for complex queries. It has functionnalities similar to those of the XCAT-DB interface (Michel 2004). A Java API allows database users to handle records in Java instances. The API is read-only by default but a sp ecific mo de can b e used by the SAADA-DB's owner to set-up p ersistent links b etween data (e.g. cross-match). The API has not b een designed to load data. This task is dedicated to the dataloader mo dule by reading lo cal pro ducts or by querying some external databases. 3. Architecture of a SAADA-DB

Below is a short description of the SAADA-DB mo dules. The organization of all SAADA-DB comp onents is shown in Fig 2. - API sp ecific: Generated API including classes mo delling the p ersistent data. Data loader: Mo dule used to load data from the input files/streams. - Mo dule servlet: Automatically generated web interface. - Mo dule web wervice: This mo dule handles database accesses by web services. - Ob ject cache: Achieves the conversion of relational data into p ersistent ob jects. - API generic: Low level p ersistence functionalities. - High level query engine: Query optimizer. - Mo dule up date: This mo dule is in charge of up dating p ersistent ob jects. It can not create new ob jects but it can mo dify some attributes such as correlation links b etween instances. - Op en mo dules: Built out functionalities.


SAADA

123

Figure 2.

SAADA Architecture

4.

Auto-configuration

One of the most useful prop erty of SAADA , its auto-configurability, allows astronomers to create their own databases without writing any line of co de in Java or SQL. With SAADA, database owners must just set a few rules at configuration time which sp ecifies the mapping b etween input data and classes. Input data are identified by directory names, filename masks or some inner keywords. The collection in which data must b e stored are also sp ecified at configuration time. From this configuration file (XML) and from the input data checking, SAADA is able to build SQL tables and Java classes which together are going to form the new SAADA-DB. 5. Ob ject-Relation Mapping

As all SAADA-DBs are built on the top of the same data mo del, the ob ject mapping is quite simpler than for any other general purp ose Ob ject-Relationnal system. This simplicity added to considerations on low level functionality and on p erformances lead us to cho ose to develop our own ob ject mapping layer. The mapping mechanism is classically (Rahayu 2000) based on the use of ob ject identifiers (OIDs). From any OID SAADA is able to: ­ determine the table where the ob ject is. identify the ob ject class. ­ retrieve the instance content. With a single OID, any data record can b e retreived either in the relational world or in the ob ject jungle (or vice-versa). OIDs are unique within a given SAADA-DB. 6. 6.1. Performance fonctionnalites Ob ject Cache

The ob ject cache is the hot sp ot of the system. It is in charge of transforming table records into Java instances and esp ecially to minimize databse accesses. Its setup has a fundamental impact on global p erformance. Ob jects are handled by OIDs, their content is not read; but the first time an attribute is accessed,


124

Nguyen et al.

the cache is invoked to build the full instance. Ob jects no longer referenced by the application are removed from the cache by the JVM garbage collector only when the memory heap is full. Complex ob jects are built into the cache by applying a lazy-loading strategy (Kircher 2001). 6.2. Query Engine

Queries are pro cessed by a separate mo dule. The query optimization obviously has a significant impact on the global efficiency of a SAADA-DB. Users queries are translated into SQL queries including some built-in functions taking in account their complexity. Further lo cal computation on SQL query results can b e achieved b efore returning the final result. Queries only return sets of OIDs. Ob ject contents can only b e delivered by the cache. SAADA systematically implements into SAADA-DBs some sp ecific features necessary to sp eed up queries. All data will b e for instance referenced on a sky pixel map (e.g. Qb ox, Page 2002) and sp ecific indexes are setup for the pro cessing of queries including constraints on correlated data patterns. 7. Development Status

A SAADA-DB is under test. It includes all of the basic functionnalities (cache, web interface). This prototyp e is built by a piece of software hosting the main mo dules of SAADA (auto-configuration, data loader). The first public distribution will b e released in spring 2004. SAADA status can b e seen at http://saada.u-strasbg.fr/ Acknowledgments. This pro ject of thesis is funded by the Rґ egion Alsace (France) and by the Centre National d'Etudes Spatiales (CNES France). References Flanagan, D. 1999, Java Enterprise -In a nutshell, O'REILLY Kircher, M. 2001, Lazy Acquisition, http://www.cs.wustl.edu/~mk1/ LazyAcquisition.p df Michel, L., Motch, C., Page, C. G., Watson M. G. 2003, in ASP Conf. Ser., Vol. 295, ADASS XI I, ed. H. E. Payne, R. I. Jedrzejewski, & R. N. Ho ok (San Francisco: ASP), 291 Michel, L., Motch, C., Pye, J., Watson, M. 2004, "XCAT-DB a Public Interface for the SSC XMM-Newton Catalogue' this volume, 570 Page, C. 2002, Indexing the Sky, http://wiki.astrogrid.org/bin/view/ Astrogrid/SkyIndexing. Rahayu, J.W. 2000, A metho d for transforming inheritance relationships in an ob ject-oriented conceptual mo del to relational tables, ELSEVIER, Information and Software Technology 42(2000) 571-592.