|
Preparing and Submitting Tabular Data |
|
CDS ·
Simbad ·
VizieR ·
Aladin ·
Catalogues ·
Nomenclature ·
Biblio ·
Tutorial ·
Developer's corner
The CDS and other astronomical data centers are storing and distributing
the astronomical data to promote their usage primarily by professional
astronomers.
In order to ensure the scientific quality of the data,
we therefore require that the data are related to a publication
in a refereed journal, either as tables or catalogues actually published,
or as a paper describing the data and
their context.
In order to facilitate the usability of the data, and to allow their processing
by the data centers, we require that:
- the data are described accurately enough to allow
an unambiguous interpretation of the data, as well as a
comprehension of the context in which the data were acquired
and/or processed;
a single ascii file, named ReadMe, is designed for this role.
- the data are in a format which allows their usage
by tools currently in usage in our discipline -- normally
flat ascii files; other formats can be accepted, but
are converted into flat files.
A full description of the standard conventions used for the documentation
of the catalogues is available at URL
http://vizier.u-strasbg.fr/doc/catstd.htx .
The present document just tries to answer to some frequently asked question
about how to prepare the data for their inclusion in the
Data Center documents. The following topics are covered:
Contents:
1 How to prepare the Data files
It is assumed that each component of the data set is stored in a file;
each file can represent a table, a spectrum (1-D data),
or an image (2-D data).
As a general rule, plain ascii data files (also called
flat files) -- are preferred, simply because such files can
always be processed.
More explicitely, the following formats can be used:
- for tables and catalogues: ascii (simple flat files),
with details about their structures (description of columns)
detailed in the ReadMe file.
Some other data formats can be accepted, but are
converted into flat files:
latex, FITS, or TSV / CSV.
TSV (tab-separated values) and CSV
(character-separated values), are a presentation
where a dedicated character (the tab in TSV, or
a punctuation in CSV, typically the semi-colon) is used
as a column separator; this is one of the formats
available for the output of spreadsheets.
What cannot be used: postscript or word/excel processing
internal documents.
- for spectra (1-D data): either FITS file(s),
or 2-column ascii tables.
What cannot be used: postscript, word/excel documents,
GIF or JPEG images.
- for images (2-D data): FITS is the preferred format;
for images of the sky, the inclusion of the
FITS-WCS (World Coordinate System) parameters
describing the conversion between celestial coordinates
and pixel position is strongly encouraged.
What cannot be used: postscript, word/excel documents.
Therefore: never postscript files, postscript is a language
designed for printers, not for storing scientific data !
A short word about file naming conventions:
according to ISO 9660 standard, file names are
restricted to 8 + 3 characters: 8 characters in the
set [a-z0-9_-], followed by a dot and an extension made of 3
characters with the following conventions:
.dat for data files, .fit for FITS files,
.tex for TeX/LaTeX files, and
.txt for text files (ascii files containing only printable text).
Full details about the files and directories structures
can be found in the Adopted Standards for Catalogues document.
2 How to fill the ReadMe description file
This file is aimed at describing all data files stored in a catalogued
data set, and at providing the necessary explanations and references to
the stored material.
All catalogues available at CDS and in associated astronomical data centers
have such an associated file, and numerous
examples can be found on the FTP directories at CDS.
A full description of the conventions used in this ReadMe file can be
found in the
Standards for Astronomical Catalogues,
and a template is
readily accessible for
A&A tables.
A typical illustration could be e.g.
J/A+A/382/389/ReadMe. Short explanations about how
to fill the ReadMe file:
- the Keywords: part lists the following keywords:
- ADC_Keywords introduces the list of
data-related keywords, out of a
controlled set (see also
examples at ADC).
- Keywords: introduces the list of keywords
as in the printed publication
Unlike the Keywords: set
which is generally related to the scientific goal of
a paper, the ADC_Keywords are stricly related
to the tabular material collected in the paper.
- the Description: section is expected to describe the
context of the data, like the instrumentation used
or the observing conditions
-- it therefore differs from the
Abstract which tends to describe the scientific results
that the author derived from the data.
- the File Summary: section describes the files making up the set:
for each file are specified its filename, the length of the
longest line (lrecl), the number of records (number of lines),
and a caption (short title of the file). Lengthy notes
can be added if necessary.
- the Byte-by-byte Description of file: section describes
the structure of each of the data files (files with the
.dat extension). This description is made in a tabular form,
each row describing one field (column) of the data file.
The description contains the following columns:
- the starting column of the data field
- the format of the field as a fortran-like
format:
An | for a character column made of
n characters; |
In | for a column containing an
integer number of n digits; |
Fn.d | for a column containing a number of
width n digits
and up to d digits in the fractional part; |
En.d Dn.d | for a number using
the exponential notation. |
- the
units
used in the field; the usage of SI units are strongly
encouraged, avoid the CGS units
(for instance, use mW/m2 instead of
ergs/s/cm2).
- the label (heading) of the field, made of
a single word (no embedded blank);
a few
basic conventions are used for usual parameters
(e.g. positions) and related quantities
(e.g. mean errors).
- the explanations can start with the following
special characters related to some important data
characteristics:
* | (the asterisk) | indicating a
lengthy note |
[...] | (square brackets) |
indicating data ranges |
? | (question mark) | indicating a possibility of
blank or NULL (unspecified) values |
- the References: section contains the necessary references;
the usage of the
bibcode
is strongly encouraged.
For large sets of references, it is suggested to gather
them into a dedicated reference file
named refs.dat .
3 How to deposit the data
If not too bulky,
data files with their ReadMe file can be uploaded from
http://cdsweb.u-strasbg.fr/cgi-bin/Submit
where some basic checks on the ReadMe and data files are performed.
The checking procedure is also available as the
anafile package
which can be installed with the standard configure and make
Linux procedures
(man page)
Alternatively, you can:
- upload the files with their ReadMe via ftp
(recommended for large files)
at the node
cdsarc.u-strasbg.fr (130.79.128.5)
in a subdirectory of the incoming directory.
Use anonymous as userid, and your e-mail address
as password, and move to the incoming directory with the command
cd incoming
Don't worry if the answer to the dir command is afterward
550 No files found.
this directory is protected such that the file names
cannot be listed.
There, create a directory with a name of your choice (e.g. your name, or the
A&A reference, but without blank or special character) with the command:
mkdir my_choice
(! NOTICE: Remember your choice, it can't be listed later !)
Then move to the directory you just created with the command:
cd my_choice
Then deposit your files with put or mput commands.
Finally SEND AN E-MAIL telling where you've placed files to:
cats(at)simbad.u-strasbg.fr
- e-mail your files to the e-mail address
cats(at)simbad.u-strasbg.fr
if these are not too bulky (< a few Megabytes).
- or mail the data (CD or DVD) to the attention of
Dr FranÃois Ochsenbein
Centre de DonnÈes astronomiques
11, rue de l'UniversitÈ
67000 STRASBOURG, France
francois(at)astro.u-strasbg.fr
4 What happens to your data
At the CDS, some checking procedures are executed to verify
the compatibility between the data files and their description.
This can lead to interactions with the authors, but we are trying
to minimize the level of interaction.
Once the data are public, they are accessible as plain files
in FTP directories at CDS and other
participating data centers (e.g. at CfA/Harvard (USA),
CADC (Canada), or
NOAJ/ADAC, Japan).
The data are also added to the VizieR
service, with mirrors at
CfA/Harvard (USA),
CADC (Canada),
NOAJ/ADAC (Japan),
Cambridge (UK),
IUCAA (India),
BAO (China).
5 Contacts
For any question related to the preparation of the data, for
problems related to non-standard data formatting, or any other
difficulty in the management or the transfer of the electronic tables, either
send a mail by clicking on the envelope below, or contact directly
FranÃois Ochsenbein (francois(at)astro.u-strasbg.fr)