CASDA Users Guide
Version 1.1 March 2016
Welcome to the CSIRO ASKAP Science Data Archive (CASDA) Users Guide. This Guide is intended to help astronomers get started with finding and making use of data products from the Australian Square Kilometre Array Pathfinder (ASKAP).
The first ASKAP data products have been produced from science commissioning observations taken with the Boolardy Engineering Test Array (BETA). These have been selected to show the potential and unique wide-field fast survey capabilities of ASKAP and to provide some demonstration data sets to the astronomy community.
For information on the initial data release see the BETA Data Release Notes.
For a reference document that preceded the construction of CASDA see the pdf document:
CSIRO ASKAP Science Data Archive: Overview, Requirements and Use Cases (Chapman et al. 2014)
Contents
1 CASDA Overview
- 1.1 About CASDA
- 1.2 ASKAP data products
- 1.3 CASDA services on the CSIRO Data Access Portal
- 1.4 CASDA Virtual Observatory services
- 1.5 User Authentication and OPAL registration
- 1.6 Restricted CASDA tasks and project roles
- 1.7 Pawsey Supercomputing Centre user accounts
- 1.8 Getting help
2 Using CASDA with the CSIRO Data Access Portal
- 2.1 Login to the DAP using an OPAL or NEXUS account
- 2.2 Find information about ASKAP Data Collections
- 2.3 Find a persistent link to a data collection
- 2.4 Search for data products using the DAP search form
- 2.5 Carry out a cone search using the CASDA Observation Search form
- 2.6 Download data products to external locations
3 Using CASDA with Virtual Observatory Services for Catalogues
- 3.1 Install TOPCAT
- 3.2 Find and download catalogues using the VO Table Access Protocol (TAP)
- 3.3 Run simple queries in Astronomical Data Query Language (ADQL)
- 3.4 Try out some TOPCAT Features
- 3.5 Create sky plots showing positions of radio detections
- 3.6 Create a new column and add to existing table
- 3.7 Carry out cone searches on a catalogue using the VO Cone Search Protocol
- 3.8 Search for image and catalogue data products programatically using user scripts
- 3.9 Plot positions of radio detections on an optical image covering the sky region using TOPCAT with Aladin
- 3.10 Cross-match information from a CASDA catalogues with a catalogue obtained from VizieR and generate a merged catalogue
4 Using CASDA Virtual Observatory Services for Images and Image Cubes
- 4.1 Find images and image cubes and see the access services
- 4.2 Use the VO Simple Image Access Protocol for programmatic data access
- 4.3 Generate cut outs from images and image cubes using a script
5 Restricted CASDA Services
- 5.1 Download data to a location in the Pawsey Supercomputing Centre
- 5.2 Set data validation flags and information
- 5.3 See the list of individuals assigned to a Science Team in CASDA
- 5.4 Manage roles for team members
- 5.5 Add an individual to a Science Team
- 5.6 Publish a Science Team catalogue: General advice
- 5.7 Submit a VO catalogue for CASDA publication
- 5.8 Approve and release a 'level 7' Science Team catalogue
6 Publications and Acknowledgements
7 Links to external documentation
8 Document versions
1 CASDA Overview
1.1 About CASDA
The CSIRO ASKAP Science Data Archive (CASDA) provides the long term storage for ASKAP data products and the hardware and software facilities that enable astronomers to make use of these. Data products are stored at the Pawsey Supercomputing Centre in Perth, Western Australia.
ASKAP is a data driven facility where the data rates are extremely high. In full operations, the ASKAP data rates arriving at the Pawsey Centre will reach around 75 Petabytes (PB) per year. This is beyond the current ability to archive data and so raw data are not archived. Such high data rates require instead that ASKAP data processing is carried out using automated pipelines to produce 'data products' and associated metadata. These are stored and made available through the science archive. The archive can be thought of as the end stage of the full system.
ASKAP Early Science observations are due to begin in 2016. So that users can gain experience and provide feedback to the CASDA team, CASDA will provide access to demonstration data products obtained during science commissioning and pilot observing programs that have been carried out with a small number of ASKAP antennas.
1.2 CASDA data products
CASDA provides three types of data products:
- Calibrated visibility files
Calibrated visibility files are stored for total intensity and polarisation continuum observations. During Early Science, calibrated visibility data files for spectral line observations will be archived on a best efforts basis.
- Images and image cubes
Radio continuum and spectral line images and image cubes are stored in FITS format.
- Catalogues
As part pf the data processing pipelines, detection algorithms are used to search images and image cubes cubes for source detections. Detection parameters such as positions and flux densities are captured in catalogues. The CASDA catalgoue registry includes source detection catalogues with parameters determined from continuum, polarisation and spectral line images or image cubes.
1.3 CASDA services on the CSIRO Data Access Portal
CASDA provides data services in two ways: using search interfaces and tools that are accessed through the CSIRO Data Access Portal (DAP), and using Virtual Observatory Services.
The CSIRO DAP provides access to many archival data products managed by CSIRO across the organisation. These include a wide range of science areas including astronomy. Many customised tools have been added to the DAP to support CASDA. See section 3 for instructions on how to use the CASDA DAP services.
1.4 CASDA Virtual Observatory services
The Virtual Observatory (VO) uses standard protocols to enable catalogue or image data obtained from one facility to be easily compared with similar data from other facilities. For example a user might wish to compare source detections from a radio survey with detections from an optical or infrared surveys, or he/she might wish to plot a set of radio source positions on top of an optical image.
For catalogues, we strongly recommend using TOPCAT (Tool for OPerations on Catalogues and Tables. TOPCAT is a freely available program that provides an interface to Virtual Observatory compatible data products. TOPCAT is like a 'browser' that provides access to VO supported data products with many tools for working with these.
TOPCAT is a richly tooled program with many features that support working with catalogues. In this guide we give some start-up instructions for using CASDA with TOPCAT. For full tutorial and reference documentation please see the TOPCAT website.
For images and image cubes, CASDA currently provides an initial implementation of the VO 'Simple Image Access protocol (v2)' with user access through a web browser. This allows users to progromatically access images using user scripts, and to make use of an image/image cube cut-out service.
1.5 User Authentication and OPAL registration
CASDA allows users to search for data and to see what is available in the archive, without any authentication. However, user authentication is required to download data files including calibrated visibilities, images and image cubes.
For science users CASDA supports two types of user authentication as follows:
- OPAL CASDA uses the OPAL proposal applications user account system.
For general use, we recommend using OPAL accounts. To register with OPAL, go to the OPAL Home Page and click on the link to 'Register'. Enter your email address, name, affiliation and a password. The OPAL application will register you straight away and will then open a screen for you to login.
OPAL user accounts are self-managed. Please keep your account details up to date. To change user-registration details, or to request a new OPAL password, use the links to 'Update your details' and 'Change your password'. If you have forgotten your password you may request that a new one be sent via email.
- CSIRO Nexus authentication is available for individuals who have CSIRO NEXUS accounts. This is needed for some adminstration-level tasks.
For general use we recommend that all CASDA users have an OPAL account.
1.6 Restricted CASDA tasks and project roles
In addition to general user authentication, a small number of CASDA tasks are restricted to archive administrators and or members of the Survey Science teams. These include, for example, setting data validation flags, and/or accessing project data prior to public release. Tasks with restricted access are described in section 5.
CASDA recognises special roles for members of Survey Science teams. These allow team members to find and access data products for their survey(s) prior to general release, and to carry out data validation tasks. The CASDA roles for Survey Science Team members are as follows:
- Project administrators can assign roles to other team members, enter data validation information and access unreleased data products for their projects. (Note: At present the administration-level tasks are restricted to CASDA staff administrators. This will later be modified so that Survey Science team members can carry out project-level administration.)
- 'Validators' enter data validation information and access unreleased data products for their projects.
- Other team members are given access to unreleased data products for their projects but do not have permission to enter data validation information.
1.7 Pawsey Supercomputing Centre user accounts
For users who have user accounts at the Pawsey Supercomputing Centre, CASDA provides fast downloads to the Pawsey Galaxy and/or Magnus supercomputers.
Pawsey accounts are restricted to individuals who are part of science teams that have been granted access to Pawsey facilities through competitive merit allocation processes. For further information please refer to the Pawsey website or contact the CASDA helpdesk.
1.8 Getting Help
CASDA provides help through documentation and a user support service in several ways:
- This guide is intended to provide an overview of the CASDA data services and to help new users get started. To get going we recommend trying some of the items described in sections 2 and 3.
- The CASDA application pages on the CSIRO Data Access Portal provide some 'tooltips'. These are shown as yellow question marks. Click on a question mark to read the tip.
- Online documentation is available with the DAP . Click on the 'help' link at the top of the page.
- For enquiries and staff support relating to all CSIRO radio astronomy data archives, including CASDA, please send
an email to atnf-datasup@csiro.au. You will received an automated email from our helpdesk to acknowledge that your
request has been logged. A CASS staff member will reply soon afterwards. We aim to send an initial reply to user queries within four business hours.
Suggestions for improvements to CASDA tools and to this Guide are always welcome. Please send these to the helpdesk.
2 Using CASDA with the CSIRO Data Access Portal
The information in this and the following section is presented as a set of 'how to instructions' using specific data sets as examples.
| 2.1 Login to the DAP using an OPAL or NEXUS account |
|---|
|
| Notes |
| 2.2 Find information about ASKAP Data Collections |
|---|
|
| Notes The CSIRO DAP holds data for thousands of collections. You might like to explore the DAP using different keyword searches. Some example keywords are 'pulsars', 'mosquitos' and 'climate change' or try guessing. AS031 is the project code used for ASKAP BETA science commissioning observations. This is the only project code in use at present. In general terms, a data collection is a group of similar data files. For CASDA, each ASKAP project has two collections. One collection holds the data catalogues and the other the data product files for images, image cubes and visibilities. Lists of collections can be sorted using the sort options that are shown under the large blue search button. |
| 2.3 Find a persistent link to a data collection |
|---|
|
| Notes CASDA collections remain 'open' for the duration of the project. The persistent link provides one way of finding the collection. CASDA also provides Digital Object identifiers (DOIs) for data collections. A DOI provides a 'data citation' that allows a user to obtain a set of data corresponding to the date and time the DOI was issued (or 'minted'). For CASDA the DOIs will be updated at intervals of around six months from March 2016 onwards. DOIs will be included with the collections information. |
| 2.4 Search for data products using the DAP search form |
|---|
|
| Notes The search form will return information for all collections and data products that match the criteria. Only tabs that have associated data files are shown. Selecting a data tab and then clicking on the project number will open the collections information that is relevant to that tab. 'Unreleased' data are only available to team members. Data are released following a data validation process. |
| 2.5 Carry out a cone search using the CASDA Observation Search form |
|---|
|
This example finds data products for a region observed in Tucana.
|
| Notes Only one data tab is seen as CASDA has no corresponding visibility or data catalogues for this region. |
| 2.6 Download data products to external locations |
|---|
In this example an image covering 150 square degrees of the Tucana region, created by Ian Heywood from data combined over four sub-bands and three
epochs is downloaded to your local computer. This approach can be used to download any available data products including images, visibilities and catalogues.
|
| Notes For this example, the image file size is 113 MB. This may take some time to download across the networks. The Schedule Block ID (1206) corresponds to the first of four schedule blocks used for this data product. When more than one observation block is used to create a data product CASDA associates the data file with the first scheduling block. An additional file is provided with checksum information. This could be used to check the download has fully worked. |
3 Using CASDA with Virtual Observatory Services for Catalogues
The notes given in this section provide an introduction to finding and working with CASDA VO tables, mostly together with TOPCAT. These are intended to be sufficient to get started but do not cover many of the available tools and features.
The notes given here correspond to TOPCAT version 4.3.
| 3.1 Install TOPCAT |
|---|
|
| Notes TOPCAT and Java are freely available. To see which version of TOPCAT you are using open TOPCAT then click on Help > About TOPCAT. |
| 3.2 Find and download CSIRO astronomy catalogues using the VO Table Access Protocol (TAP) |
|---|
|
In this example, you will find and download a list of VO catalogues supported by CSIRO and then select a catalogue, casda.continuum_island. This contains parameters for radio astronomy 'islands' detected using source
find algorithms for an image of the Tucana region.
|
| Notes TOPCAT write options include a range of table formats including VO-table, html, FITS, ascii and csv. To write out a VO table use the format option = 'votable-tabledata'. To navigate to a selected location on your computer use the 'Filestore Browser'. |
| 3.3 Run simple queries in Astronomical Data Query Language (ADQL) |
|---|
Here are three simple examples using ADQL to query a catalogue.
|
|
Notes The asterisk in examples 1 and 3 indicates that all columns are selected. Line returns are optional in ADQL queries. |
| 3.4 Try out some TOPCAT Table Icons |
|---|
|
| Notes Using the mouse to hover over an icon will bring up a description of it. Note that the icons are arranged in groups. This group of five icons provides diffferent displays of the table data and metadata. |
| 3.5 Create sky plots showing positions of radio detections |
|---|
|
This example shows how to do a sky plot for a set of source positions.
|
| Notes TOPCAT provides powerful tools for plotting table data and allows a considerable degree of customization. Note the plot controls given below the plot. For full tutorial and reference documentation, see the TOPCAT user manual . |
| 3.6 Create a new column and add to existing table |
|---|
| In this example an extra column is added to a table, corresponding to the logarithm of the peak flux density.
|
| Notes Universal Content Descriptors (UCDs) are part of a formal, controlled vocabulary for astronomical data, provided by the International Virtual Observatory Alliance. This is intended facilitates sharing information between different applications. For further information and a list of valid UCDs, see UCD1+ controlled vocabulary. |
| 3.7 Carry out cone searches on a catalogue using the VO Cone Search Protocol |
|---|
This example searches the CASDA table casda.continuum_island for sources within a given radius of a
position and writes the results into another VO table.
|
| Notes CASDA currently restricts cone searches to a maximum radius of five degrees If this is a problem for you please let us know. |
| 3.8 Search for image and catalogue data products programmatically using user scripts |
|---|
|
The CASDA TAP service can be used together with user scripts. This can be helpful, for example, to facilitate access to large numbers of files.
In this example we provide a sample python script, tapquery.py. This connects to the CSIRO TAP Service and finds a catalogue, casda.continuum_component. A simple ADQL query is used to search the catalogue for radio components with peak flux densities below 600 mJy. The output is written in xml/VO-format to a file data.xml in the local folder.
To run the script:
|
| Notes
For this script the astropy module is used to write the output to a VO-format table. Here is a similar script that can be run without astropy tapquery_csv.py. This version writes the output to a csv file. The sample script can be modified to connect to different catalogues and to carry out different ADQL commands. For additional documentation see the internal comments in the script. |
| 3.9 Plot positions of radio detections on an optical image covering the sky region using TOPCAT (version 4.2) with Aladin (version 9) |
|---|
|
| Notes Aladin is an interactive sky atlas from the Centre de Donneées Astronomique de Strasbourg (CDS). Note the other surveys provided by Aladin Desktop. The same approach can be used with any of these. |
| 3.10 Cross-match information from a CASDA catalogue with a catalogue obtained VizieR and generate a merged catalogue |
|---|
|
| Notes TOPCAT provides several options for joining tales and matching positions. For full details please see TOPCAT documentation. |
4 Using CASDA Virtual Observatory Services for Images and Image Cubes
For images and image cues, the Virtual Oservatory provides two protocols. The SIAP (Simple Image Access Protocol) is used to 'discover' relevant files, whilst the SODA (Serverside Operations for Data Access) Protocol provides the data access.
These services are primarily intended for use with scripts written by science teams using python or other scripting languages. This approach allows a high degree of customisation together with the aility to handle many files at a time.
These protocols may be somewhat difficult to use whilst some knowledge of python is needed to develop scripts. We are happy to help users get started and would welcome any feedback to assist us improve the features described in this section. Please send any comments to ATNF Data Support.
| 4.1 Find images and image cubes and see access services. | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| This item describes how to navigate to a list of images and image cubes services. When one image or cube is selected, several data access services are shown. Sections 4.2 and 4.3 provide examples of accessing files and generating cut-outs using python scripts.
| ||||||||||||||||||
| Notes The query string 'BAND = 0.25 0.30' refers to a wavelength range, with units given in metres. |
| 4.2 Use the VO Simple Image Access Protocol for programmatic data access |
|---|
| Please note this item is documented here but has not yet been fully tested by the CASDA team.
In this example we provide a sample python script, siap.py, that can be used to download images and image cubes overlapping a particular sky region. Supported formats for right ascension are 22:58:04.88 22h58m4.88s and 344.52039 , and for declination -36:25:39.4 -36d25m39.4s and -36.42758 The script will query the CASDA SIAP service for images that overlap a 0.1 degree radius circle centred on the coordinates provided. All images and image cubes will then be downloaded in full. To run this script:
As an example, the following command will download HI images of the galaxy group IC1459. The output files are written into a folder 'output'. For a 0.1 radius, three images are produced. >> python siap.py OPAL_username OPAL_password 344.52039 -36.42758 output | Notes This script provides an example that can be customised for your own purposes. Extensive internal documentation is included. To provide more detail, the script:
4. Creates an asynchronous task to access the images and/or image cubes 5. Adds the authorised id for each image and image cube. 6. Starts the task to access the images/image cubes. 7. Waits until the job has completed. 8. Downloads all images/image cubes. To access files, the script uses OPAL authentication and this must be provided for all downloads. You will only be able to retrieve images which have been openly released or that you have access to as a member of a science team. Please note that the OPAL_Password should be surrounded by single quotes on a mac or linux command line if there are any non-alphanumeric characters in the password. |
| 4.3 Generate cut outs from images and image cubes using a script |
|---|
| Please note this item is documented here but has not yet been fully tested by the CASDA team.
In this example we provide a sample python script, cutouts.py, that can be used to generate a set of cut outs obtained from the images and image cubes generated from observations associated with a scheduling block. The script reads a VO-format catalogue, casda_continuum_component, that includes positions for radio continuum components. For each component with a peak flux density above 150 mJy it generates a small cut-out image from the associated image files where the detected position falss within the full-size image or cube. The cut out images have sizes of 0.2 x 0.2 degrees in right ascension and declination and contain the same number of planes as is the full sized image or cubes. Thus a cut-out from a full spectral line image cube may have up to 16,200 channels along the third axis. A cut-out from a single plane image will also be a single plane image. To run this script:
As an example, the following command will generate cut outs from a radio continuum image of the Tucana region. The output files are written into a folder 'output'. For a flux-cut-off of 150 mJy, 89 cut-out images are produced. >> python cutouts.py OPAL_username OPAL_password 609 output |
| Notes This script provides an example that can be customised for your own purposes. Extensive internal documentation is included. Note that the script includes a switch to retrieve entire image cubes instead of cutouts. To provide more detail, the script:
To access files, the script uses OPAL authentication and this must be provided for all downloads. You will only be able to retrieve images which have been openly released or that you have access to as a member of a science team. Please note that the OPAL_Password should be surrounded by single quotes on a mac or linux command line if there are any non-alphanumeric characters in the password. |
5 Restricted CASDA Tools
| 5.1 Download data to a location in the Pawsey Supercomputing Centre (Pawsey user account required) |
|---|
|
In this example CASDA is used to find a set of data files to download. The user then logs into the Pawsey Supercomputing Centre using a Pawsey account. CASDA returns one or more links. These are copied into a .txt file and the files are retrieved to a location in the Pawsey Supercomputing Centre using unix commands. The following steps are carried out from a Nexus or OPAL account:
The following steps use a Pawsey account:
Alternatively to transfer many files: |
| Notes See section 1.6 for information on Pawsey accounts. For Windows users: A handy hint to cut and paste text between files in different locations is to i) highlight the text to copy, ii) use cntrl-C to copy, iii) then right-click in the text file to insert. |
| 5.2 Set data validation flags and information (Survey Science Team members with validator permissions ) |
|---|
| Data validation is carried out using the CSIRO DAP or Survey Science Projects by members of the Survey Science teams. This task can only be carried out by
individuals with validation-level access to the data products.
The validation process involves setting a flag in the data base and adding additional information. Validation is applied at the level of individual data products. So for a set of images created for a project - a validation flag is set for each image. Following validation data products are 'released'. The steps below describe the process:
|
| Notes For help with the validation process please contact the helpdesk: atnf-datasup@csiro.au. |
| 5.3 See the list of individuals assigned to a science team (CASDA Administrator Task) |
|---|
|
|
| Notes In a later CASDA release, this task will later be made available to project-level adminstrators. |
| 5.4 Manage roles for team members (CASDA Administrator Task) |
|---|
|
| Notes In a later CASDA release, this task will also be made available to project-level administrators. |
| 5.5 Add an individual to a science team (CASDA Administrator Task) |
|---|
|
| Notes In a later CASDA release, this task will later be made available to project-level adminstrators. As a starting point, CASDA generates an list of team members from the PI and co-authors on the associated OPAL project proposals. This list is then managed through CASDA. |
| 5.6 Publish a Science Team catalogue: General advice (for Survey Science Teams) |
|---|
|
For some ASKAP surveys, science teams will generate catalogues with final science results. These are referred to here as 'Science Team catalogues'. As an example, a catalogue might contain a
set of polarisation properties for a list of detected source. Another catalogue could contain properties dervied from HI spectra for nearby galaxies, together with optical identifications. Compiling such catalogues is
the responsibility of Science Teams.
CASDA provides a service so that Science Team catalogues can be published as part of the ASKAP science archive. If your team is interested in publishing level 7 catalogues please note the following advice.
|
| Notes CASDA catalogues are published together with digitial object identifiers (DOIs). These can be cited in journal publications. |
| 5.7 Submit a catalogue for CASDA publication (Nexus account required) |
|---|
|
| Notes Improvements to this task are planned. In later releases the above steps should be easier to use. For now, you may need some assistance for catalogue deposits. Please see the help tips (yellow question marks) and/or contact ATNF data support. |
| 5.8 Approve a Science Team catalogue (Approvers only) |
|---|
|
| Notes |
6 Publications and Acknowledgements
See CASS publications and acknowledgments7 Links to external documentation
8 Document versions
| Author | Version | Date | TOPCAT Version |
Notes |
|---|---|---|---|---|
| J Chapman | 1.0 | 5 November 2015 | 4.2 | Initial release |
| J Chapman | 1.1 | 24 March 2016 | 4.3 | Updates and new content in sections 1 to 5. |
