Next: About this document ...
Up: No Title
Previous: Results
After an analysis effort of this type, and on a project where such an analysis
is carried out for the first time, it is always useful to list the short-comings
of the telemetry and the tools used to analyse that telemetry. This is
because, while one always tries and plans that such an exercise will never
be needed, it often proves to be a very useful process to have undertaken, as
it points out what needs to be better documented, recorded, and what tools and
software are really needed for events like this. In our analyses, we have
tried to keep notes of the most serious problems we had, and how we need to
fix them, so that, should something like this occure again, we will be in a
better position to obtain better conclusions.
Our recommendations are, in no particular order:
- There was/is little quantitative data in the night logs, which is
not very
surprizing, given their nature (i.e. a verbal description of the night's
activities. We are in the process of
placing more quantitative data in them, such as seeing measurements, on-sky
time, and the like, and formatting this data in a way that will allow for
automatic parsing of that data for metric generation.
- The TCC logging was not turned on during this darkrun. As was noted
above, there was a rather severe performance hit on the TCC when we
performed spectroscopic guiding. We could probably use a faster machine
on the TCC side (we are currently running on a DEC 3000), but the cost and
time of porting the code (it uses a lot of VAX extensions in the FORTRAN)
and communications protocols may be too high. In this latter case, it may
be that we do not log data when spectroscopy is being done (not a very good
solution, but perhaps the only viable one we have at present). Kinney has
briefly explored the idea of using EPICS for the logging of the TCC
messages, and this may be a
promising avenue to explore. A VMS extension to EPICS is available that
allows the capture of VMS broadcast messages (just what the TCC does).
- TPM logging is a spotty proposition at best and currently needs to be
made either automatic or done constantly. As it is now, we have very large
files being written to disks which are too small, and the TPM logs are not
being automatically archived in a consistant fashion. We also run into
problems of getting the log started again after an MCP reboot/crash, as this
must be done by had again. We should be automatically archiving these logs
at APO every day, so that we can be confident, that if disk space is needed,
we can delete with confidence of a backup. The watcher needs to
monitor /mcptmp on sdsshost and give warnings when that disk is
more than 70% full. A change request has been filed under the GNATS
database. The data that is logged in the files are done so at the rather
astounding rate of once per 0.05s. This would appear to be overkill for the
type of analysis we are doing right now, but maybe necessary for analysing
tracking accurracy. The TCC values for the axes are either not logged or
logged using very odd and peculiar units. We need to have the TPM documentation
updated to explain the units used in the log files for the various parameters.
There is also no documentation for the one piece of code which is available
for analysing these logs (``log2asc''). This code itself is very slow
and has very few options (i.e one cannot choose which logged variable to
extract; it is all or nothing). The code is not portable: we were forced to
run it on either sdsshost or sdss-commish, both are at least one
generation old (CPU-wise), and we could have literally saved hours of time by
being able to run it on one of the Observers' duel-processor machines. We need
tools that are more portable, have more flexible options, and have clearer
documentation. We also need to revisit what is logged with input from the
Observers. The effeciency of log2asc needs to be improved. To extract
3 hours of TPM data took over 7 hours wall-clock time. Clearly, if the TPM
logs are to be a useful tool, we have to do something about this.
- Focus sweep data was very important and and useful data, but the way in
which it is archived to tape, the slowness of getting the data off of tape, the
lack of disk space that allows us to dump the contents of a tape, and search
through it to find the FITS files of interest, and problems which some of the
FITS keywords (i.e. the FOCUS keyword is not being populated at all
in the science headers) leaves a lot to be desired. We really need
well documented, which will allow us to peruse a tape for its contents
without having to search through many tens of files located in various
non-obvious directories would be the best thing that the software group
could provide. Jen Adelman and Brian Yanny of FNAL are working with
the Observers to finalise the development tools that EAG uses to do this.
We should invest in the hardware and tools to make accessing the data
straight forward. The script roboRead will enable us to get data
directly from the tape robot at FNAL, but currently can only run from
sdss.fnal.gov. Jen is working on making this script portable and runnable
from remote sites. Currently we are dependent on the backup tapes here at
APO to retrieve data.
- The use and storage of backup tapes on site (i.e. APO) is now a critical
problem. We, as a project, obviously need to think about how long to keep
backup tapes on site, and how to store and handle them. Currently we have
piles in a storage cabinet in the computer room, so that as it stands now,
the archive is next to useless. Every time a tape needs to be found it takes
quite a few hours to find it. The sooner we decide what to do about backup
tapes and how to handle them, the sooner we can get out of this morass.
- The Astrologs and the Murmur logs were not useful diagnostic tools
for our purposes. The focus-<MJD>.par files which record any changes
in focus do not carry time stamps. There are run numbers and frame numbers,
but these are not easily converted into a straight forward timebase to use
for plots. It took the better part of two days to create a single plot of
commanded focus vs time. This type of plot is useful for comparing
against the secondary mirror MIG values and allows the observer to see if
the mirror is responding properly to focus commands. We would like to see
the timestamp problem corrected before we take on-sky data again. It must
also be a requirement that all log data have a time stamp on each line
that is added to a log. This is a basic pice of information that will allow
us to use the logs for diagnostic purposes. It will save a lot of time and
frustration if the survey agrees to use a standard timebase for all the log
data. Converting between Unix System Time, Mountain Time, and Modified Julian
Date with Frame and Run Numbers is currently a time consuming process that
could be eliminated with a little bit of effort now. We also need a web site
that documents the uses and the content of all the log and parameter files.
All the appropriate logging will be meaningless unless it is properly
documented and configured.
Next: About this document ...
Up: No Title
Previous: Results
11/24/1999