Next: Mirroring the ADS Bibliographic Databases
Up: Archives and Information Services
Previous: The VizieR System for Accessing Astronomical Data
Table of Contents -- Index -- PS reprint -- PDF reprint
Astronomical Data Analysis Software and Systems VII
ASP Conference Series, Vol. 145, 1998
Editors: R. Albrecht, R. N. Hook and H. A. Bushouse
P. Zografou, S. Chary, K. DuPrie, A. Estes, P. Harbo and K. Pak
Smithsonian Astrophysical Observatory, Cambridge, MA 02138
Abstract:
A data archive is near completion at the ASC to store and provide
access to AXAF data. The archive is a distributed Client/Server
system. It consists of a number of different servers which handle flat
data files, relational data, replication across multiple sites and the
interface to the WWW. There is a 4GL client interface for each type of
data server, C++ and Java API and a number of standard clients to
archive and retrieve data. The architecture is scalable and
configurable in order to accommodate future data types and increasing
data volumes. The first release of the system became available in
August 1996 and has been successfully operated since then in support
of the AXAF calibration at MSFC. This paper presents the overall
archive architecture and the design of client and server components as
it was used during ground calibration.
The ASC archive is projected to contain terabytes of data including
ground and on orbit raw data and data products. The archive stores the
data following requirements for data distribution across sites, secure
access, flexible searches, performance, easy administration, recovery
from failures, interface to other components of the ASC Data System
and a user interface through the WWW. The architecture is extensible
in order to accommodate new data products, new functions and a growing
number of users.
Data such as event lists and images need to be kept in files as they
are received. They also need to be correlated with engineering and
other ancillary data which arrive as a continuous time stream and to
be associated with a calibration test or an observation ID. A level of
isolation between the data and users is desirable for security,
performance and ease of administration. The following design was
chosen. Files are kept in simple directory structures. Metadata about
the files, extracted from file headers or supplied by the archiving
process, is stored in databases. This allows file searches on the
basis of their contents. Continuous time data extracted from
engineering files is also stored in databases so the correct values
can be easily associated with an image or an event list with defined
time boundaries. In addition to partial or entire file contents, file
external characteristics such as its location in the archive, size,
compression, creation time are also stored in databases for archive
internal use. In addition to databases with contents originating in
files, there are also databases which are updated independently, such
as the AXAF observing catalog and the AXAF users database.
Figure 1:
Data Design.
|
A simplistic example of the data design is shown on
Figure 1. The archive contains a number of
proposal files submitted by users. It also contains a number of event
files, products of observed data processing. A table in a database
contains the characteristics of each file. The proposal table contains
a record for each proposal which points to the associated file in the
file table. The observation table contains a record for each observed
target and has a pointer to the associated proposal. An observation is
also associated with files in the file table which contain observed
events. Related to the proposal is the AXAF user who submitted it and
for whom there is an entry in the AXAF user table. An AXAF user may
have a subscription to the AXAF newsletter.
The data is managed by a number of different servers. A Relational
Database server stores and manages the databases. It is implemented
using the Sybase SQL Server. An archive server was developed to
manage the data files. The archive server organizes files on devices
and directories. It keeps track of their location, size, compression
and other external characteristics by inserting information in a table
in the SQL Server when the file is ingested. It also has data specific
modules which parse incoming files and store in databases their
contents or information about their contents. The server supports file
browse and retrieve operations. A browse or retrieve request may
specify a filename or enter values for a number of supported keywords
such as observation or test ID, instrument, level of processing, start
and stop time of contained data. Browse searches the database and
returns a list of files, their size and date. Retrieve uses the same
method to locate the files in the server's storage area and return a
copy to the client. The archive server responds to language commands
and remote procedure calls. Language commands are used by interactive
users or processes in order to archive or retrieve data. A custom 4GL
was developed in the form of a ``keyword = value'' template which is
sent by clients and is interpreted at the server. The remote procedure
call capability is used for automated file transfer between two remote
servers.
Figure 2:
Server Configuration for XRCF Calibration.
|
The server infrastructure uses the Sybase Open Server libraries which
support communications, multi-threading, different types of events and
call-backs and communications with the SQL server. A C++ class layer
was developed to interface the libraries with the rest of the system
(Zografou 1997). File transfer uses the same communications protocol
as the SQL server which is optimized for data transfer and integrates
with other server features such as security.
A third type of server was needed in order to automatically maintain
more than one copy of the data at two different locations. The Sybase
Replication Server is used to replicate designated databases. Via
triggers in the database at the target site the local archive server
is notified to connect to its mirror archive server at the source site
and transfer files. Queuing and recovery from system or network
down-time is handled entirely by the Replication Server.
Client applications use the Sybase Open Client libraries with a custom
C++ interface (Zografou 1997). The same client libraries are used for
client applications to either the SQL or the archive server.
During ground Calibration at the X-Ray Calibration Facility at MSFC
two archive installations were operating, one at the operations site
at XRCF and a second at the ASC. Communications across sites were via
a T1 line. Each installation consisted of a SQL Server and an archive
server. A set of replication Servers were setup to replicate all
databases which triggered replication of all files. The system layout
is shown on Figure 2. Data the in form of files
entered the system at XRCF, which was the primary site, and was
replicated at SAO. With some tuning to adjust to unexpectedly high
data rates the system kept up with ingestion, replication and
retrievals by processing pipelines at XRCF and users at the ASC. There
were no incidents of data corruption or loss and the overall system
was very successful.
At the end of the XRCF calibration the system was adapted to support
ASC operations at the SAO and AXAF OCC sites connected with a T3
line. In the new configuration only critical data is being
replicated. All other data is distributed according to user access. A
new server component, the Sybase JConnect Server, and a new Java/JDBC
client interface have been added to support WWW access (Chary
1997). The second release of the system, including the WWW interface,
is currently operational in support of proposal submission.
References:
Zografou, P. 1997, Sybase Server Journal, 1st Quarter 1997, 9
Chary, S., Zografou, P. 1997, this volume
© Copyright 1998 Astronomical Society of the Pacific, 390 Ashton Avenue, San Francisco, California 94112, USA
Next: Mirroring the ADS Bibliographic Databases
Up: Archives and Information Services
Previous: The VizieR System for Accessing Astronomical Data
Table of Contents -- Index -- PS reprint -- PDF reprint
payne@stsci.edu