Next: ORAC-DR: One Pipeline for Multiple Telescopes
Up: Data Management and Pipelines
Previous: Image Reduction Pipeline for the Detection of Variable Sources in Highly Crowded Fields
Table of Contents -
Subject Index -
Author Index -
Search -
PS reprint -
PDF reprint
Mehringer, D. M. & Plante, R. L. 2003, in ASP Conf. Ser., Vol. 295 Astronomical Data Analysis Software and Systems XII, eds. H. E. Payne, R. I. Jedrzejewski, & R. N.
Hook (San Francisco: ASP), 233
Status of the BIMA Image Pipeline
David M. Mehringer & Raymond L. Plante
National Center for Supercomputing Applications
Abstract:
The BIMA Image Pipeline is nearing production mode. In this mode it
will automatically process data that has recently been
transferred from the telescope. Its products will be calibrated
uv
datasets, calibration tables, FITS images, etc. that will be ingested
in the BIMA Data Archive and be retrievable by astronomers.
The BIMA Image Pipeline
processes data from the
BIMA Array
using the
AIPS++ astronomical data processing
package.
Currently, processing is
initiated manually, but soon this process will start automatically
after data from observing tracks have been ingested into the
archive. Only single tracks can be processed at present, but in the
future multiple tracks (e.g., from different telescope
configurations) from the same project will be combined and
processed. Processing jobs normally consist of multiple stages (e.g., serial
processing for filling and calibration and parallel processing for
image deconvolution). After
the processing is complete, the data products are re-ingested into the
BIMA Data Archive
where they are available for users to download.
Figure 1 is a block diagram of the BIMA Image Pipeline. The major
components are discussed below.
Figure 1:
Architecture of the BIMA Image Pipeline
|
Raw BIMA data are automatically transferred from the telescope at Hat
Creek, California to NCSA over the internet where they are ingested into the
BIMA Data Archive by the Ingest Engine. The Ingest Engine notifies the
Event Server that new data have been archived and are ready for
processing. The Event Server determines if the new data should be
processed, and if so, creates a so-called Shot document. This
document is an XML document containing the metadata for the
data collection (e.g., observing track) to be processed, wrapped in a
root SHOT node. The Event Server places the SHOT document in an area
where the Script Generator can locate it.
The Script Generator is responsible for creating various scripts
(Glish/AIPS++, shell/MIRIAD, and other shell scripts used for
moving data) that are ultimately used for processing the data
collection. It does this by transforming the Shot document created by
the Event Server using XSLT into the various scripts to be used as
well as XML metadata describing the products of the processing.
Generally, a processing job is composed of several sub-stages. For
example, a processing job to fill, calibrate, and image data is
composed of a sub-stage that uses a shell script to copy the data to be
processed to the work
area, a sub-stage that uses a Glish script for filling and calibration
in AIPS++ on a serial machine, a sub-stage that uses a Glish script for
imaging and deconvolution on a parallel machine such as a Linux
cluster, and a sub-stage that uses a shell script to archive the data
products. The
type of processing to be done (e.g., filling only, filling and
calibration only, filling, calibration, and imaging) is encoded in the
XML metadata of the collection. Each type of processing has an
associated XSL stylesheet that is used to transform a Shot document
to create the
scripts appropriate for that specific type of processing. Upon
successful generation of these files, the Script Generator notifies
the Queue Manager that a collection is ready to be processed. This
notification includes informing the Queue Manager in what order the
scripts that were just generated should be run as well as in which
queue each of these sub-stages should be placed.
The Queue Manager is responsible for scheduling, initializing, and
monitoring jobs and sub-stages. The Queue Manager manages several
queues. In general, each processing queue is configured to run a
maximum number of jobs at any given time.
A job starts out its life in the pending queue. When room
becomes available in the queue in which the job's first sub-stage will
run, the Queue Manager moves the job to the running queue. The
first sub-stage is placed into the type of queue that is responsible for
processing that type of job (e.g., the unix queue for processing
shell scripts, the serial queue for processing AIPS++
jobs on serial machines, or the parallel queue for processing
AIPS++ jobs on parallel platforms). If that sub-stage ends successfully
(generally determined by the exit status of the script that it runs),
the next sub-stage is started and placed in the appropriate queue. If
all sub-stages complete successfully, the job finishes and is moved to
the success queue. The data products (filled and calibrated
uv data in AIPS++ Measurement Set 2 format, AIPS++ calibration tables,
images in FITS
format, etc.) of successful processing runs are ingested into the
archive where they can be retrieved by users.
If any sub-stage fails, the main job is
terminated and moved to the error queue and none of the data
products are ingested into the archive.
The primary means of monitoring the states of the various queues
by both users and administrators is via web
interfaces. These interfaces are powered by Apache Tomcat, which
transforms XML documents describing the current queue state to
HTML using Java Servlets. Information provided includes the start and
end times of all the
sub-stages associated with a job, the script used to run that stage,
etc. The administrator view has the added features of allowing jobs to
be purged from queues, jobs to be moved to other queues, etc.
The scripts that are used for processing come from a recipe
library. Each recipe is written to do a single phase of the
processing. For example, there are standard recipes for filling,
calibration, and imaging. Jobs (and their sub-stages) generally call
numerous recipes. The recipes needed by a sub-stage are called by a
top-level Glish script. For example, below is an example of a
top-level Glish script that fills, calibrates, and images data from a
single track. This script is invoked by a single sub-stage of a
processing job (in this case, the sub-stage is run on a serial
platform, though normally the deconvolution portion of imaging will be
run on a parallel platform to take advantage of AIPS++'s ability to
utilize such systems).
# set some project dependent parameters
sources := ['1733-130','3c84','sgb2n','venus'];
targets := ['sgb2n'];
phcals := ['1733-130'];
# include individual recipes
include 'pipelineinit.g'; # initialization
include 'filling.g'; # recipe for filling
include 'flagging.g'; # recipe for flagging
include 'calibration.g'; # recipe for calibration
bip.numprocs := 1 # number of processors
include 'imaging.g'; # recipe for imaging
if(bip.imaging()) # main imaging function that returns
# true if successful
note('Imaging completed successfully',priority='NORMAL');
else
note('Imaging failed',priority='SEVERE');
include 'pipelinefinalize.g'; # finalization
note('Pipeline processing successful',priority=bip.note.NORMAL);
exit 0;
The BIMA Image Pipeline processes BIMA data and makes the products of
this processing available to users by sending them to the BIMA Data
Archive to be ingested. There are several components to the
pipeline. The Event Server receives processing events and notifies the
Script Generator that a data collection is awaiting processing. The
Script Generator generates the various scripts that will be used to
process the data as well as XML files that contain some of the
metadata for the data products. The Script Generator informs the
Queue Manager of the collection that is waiting to be processed and
also tells how many sub-stages there are to the job and what type of
processing each sub-stage requires. The Queue Manager manages the
various sub-stages, and if all are successful, alerts the BIMA Data
Archive of the new data products to ingest.
© Copyright 2003 Astronomical Society of the Pacific, 390 Ashton Avenue, San Francisco, California 94112, USA
Next: ORAC-DR: One Pipeline for Multiple Telescopes
Up: Data Management and Pipelines
Previous: Image Reduction Pipeline for the Detection of Variable Sources in Highly Crowded Fields
Table of Contents -
Subject Index -
Author Index -
Search -
PS reprint -
PDF reprint