Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.eso.org/~qc/dfos/dfoMonitor.html
Дата изменения: Mon Feb 1 16:58:17 2016
Дата индексирования: Sun Apr 10 00:39:47 2016
Кодировка:
dfoMonitor

Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable new: see also:
 

v4.1.2:
- new cronjob checks for cleanupRawdisp and 'dfosCron -t autoDaily'

The tool is managing the daily workflow. Check out under 'Operations' (e.g. here) for information about that workflow.
 

v4.2:
- new button 'reference' for calling rawdisp2reference (here)

v4.2.1:
- minor change for PHOENIX (sci_Certif flag supported again)




[ used databases ] databases sara.. transfer_requests (transfer status), observations..data_products; ngas..ngas_files (NGAS access)
[ used dfos tools ] dfos tools qcdate, ngasClient; checks output from autoDaily, createReport, ingestProducts
[ output used by ] output $DFO_MON_DIR/dfoMonitor.html; tool called by autoDaily and some tools of the daily workflow
[ output used by ] upload/download upload: XDM / ngasWatcher / transferWatcher files to WISQ; HTML output to qcweb
topics: description | installation checks | PHOENIX | XDM | AB number | data transfer | checkboxes: autoDaily | HC monitor | calChecker | last ABs | Links: service user links | output pages | how to use | configuration | status | operational aspects | technical: decision making

dfoMonitor

[ top ] Description

enabled for parallel execution

This tool provides the central GUI for monitoring and managing the QC daily workflow. It scans the active DFO dates, reads and displays their process status, and offers the next reasonable workflow steps. It serves as the standard interface to the whole daily workflow, with the goal to offer all needed functionalities and interactivity.

With version 4.0, the tool supports both the standard dfos workflow and the PHOENIX workflow. For PHOENIX, the dfoMonitor is mainly a passive monitor and has many functions switched off. In the following, the 'dfos only' functions are marked.

The monitor offers action buttons to launch the next step interactively. Mouse over any of these buttons to get a short description of the offered functionality. Enabling of workflow steps is based on the rule set described below.

At some steps, the monitor writes additional information like 'number of raw files'. At some steps links are offered to related information:

Links are offered to the nightlogs. Data reports are linked to the HTML output generated by createReport.

There is a link to the daily calChecker result pages ("CAL"). They are permanently stored under $DFO_LST_DIR/CALCHECK.

The link 'status' is an extraction from DFO_STATUS for the corresponding date, intended as an overview of the current processing status.

The tool displays filtered files, as detected by filterRaw. If an entry exists in $DFO_LST_DIR/filt_<instr>_<date>.txt, the corresponding box is colored yellow, and a link to the list is offered.

The tool displays whether the night had SM or VM (or both) SCIENCE runs. This information is extracted from the data reports.

There is the standard 'refresh' option, plus options to update the load and disk status.


[ top ] Installation checks. The tool monitors the DRS_TYPE (as configured in config.createAB): condor on (CON) or off (anything else). The configured $DFS_RELEASE is displayed and compared to the default dfs setting under /home/flowmgr . If configured, $MIDAS_CHECK compares the default MIDAS version (defined in /home/flowmgr/dfs) to $MIDVERS. Finally, the currently enabled pipeline version is displayed (with the detmon pipeline being filtered out).

[ top ] Monitoring of AB number. If the number of ABs in $DFO_AB_DIR is beyond a certain limit, the AB monitor (getStatusAB) becomes slow, and this will also slow down autoDaily. There have been cases that autoDaily effectively got stuck. To become aware of potential issues, the total number of ABs in $DFO_AB_DIR is monitored. It scores red if a hard-coded threshold is hit. Currently this threshold is 2500.

N_ABs:
530

[ top ] Disk space, XDM. The data disk space is monitored since with the data disk full, no automatic processing is possible. A quick overview is provided:

data disk: 120.5 GB (30%)

It updates in the background (ash mechanism) if clicked.

The XDM (eXtended Disk space Monitor) provides detailed feedback about the disk space usage on the data disk. It monitors the following data disk directories:

disk space on $DATA_DISK (total: 870 GB)
RAW: $DFO_RAW_DIR updated each time dfoMonitor is called
CAL: $DFO_CAL_DIR
SCI: $DFO_SCI_DIR
DFS: $DFS_PRODUCT
LST: $DFO_LST_DIR
*HDR: $DFO_HDR_DIR these values are normally read from the DFO_STATUS file and therefore static!
They are updated on demand, using [refresh], which will take a couple of seconds. They are also updated eventually when they get removed from DFO_STATUS (if 5000 new entries make them outdated so that they are auto-removed).
*PLT: $DFO_PLT_DIR
*LOG: $DFO_LOG_DIR
SUM: sum of all above  
OTH: all other data on $DATA_DISK in non-standard folders  
FREE: remaining free remaining free disk space

Disk space used by the directories is listed in GB; the bar also indicates usage in percentage. The disk space score is green if less than 80% disk volume is occupied, and red if more than that.

If a quota is defined in the config file (DATA_QUOTA), it is indicated and taken into account.

The XDM is exported to http://www.eso.org/observing/dfo/quality/WISQ/XDM/XDM.html and linked to the WISQ monitor on the navigation bar.


[ top ] PHOENIX notifications

PHO
ENIX

phoenix is the workflow tool for automatic science processing. It is used by the IDP accounts on muc08, muc09 and muc10. Find more information here.

The following section is relevant only for an instrument that has a PHOENIX process set up on muc08/muc09/muc10. For all other instruments this section can be ignored.

For the stream part of a phoenix process, it is desirable to have a signal from the operational account that a certain set of master calibrations has been finished and is available for phoenix processing. This 'batch unit' for phoenix is one month, for no other than pragmatic reasons.

To become 'phoenix-enabled' the dfoMonitor needs to be configured. There are the configuration keys PHOENIX_ENABLED and PHOENIX_ACCOUNT to be defined properly, see further below.

Here is a quick overview of the workflow. Find more information on the phoenix page.

a) If PHOENIX_ENABLED and PHOENIX_ACCOUNT are set, histoMonitor, when encountering a new month upon being called from 'finishNight', sends a signal (email) to the QC scientist that a new month has started, meaning that a set of certified master calibrations is available for the previous month. A new status flag 'phoenix_Ready' is written into DFO_STATUS, along with the previous month (format YYYY-MM).

b) This flag is catched by dfoMonitor and used to flag that month on the main output page:

PHO
ENIX:

2013-06

 
If no new PHOENIX job is on the ToDo list, this field is empty:
PHO
ENIX:

none

 

c) Then the QC scientist can launch this new PHOENIX job.

d) When this step has been done, the QC scientist can confirm this, by pushing the button 'done' on the dfoMonitor. This triggers a dialogue, where the user is asked to confirm the execution, and then this month is removed from the DFO_STATUS file. If there is more than one month, all months will offered for confirmation, one after the other.


[ top ] Data transfer links (dfos only). This checkbox has links related to the data transfer system (DTS), plus two rows for status checks of NGAS access ("ngas") and of the health of the transfer process ("transfer"), plus two buttons to launch queries. The ngas status is checked each time the dfoMonitor tool is launched, by launching an ngas download with ngasClient (the file is hard-coded as $TEST_FILE). If an error occurs, its code is displayed. As a timeout mechanism, the monitor waits for 60 sec at maximum for ngasClient, and then aborts. The DTS test and the ngas download are done in the background, and the result from the previous execution is displayed. This is usually good enough since dfoMonitor is called by many tools and therefore usually sufficiently up-to-date. The background call is done because of performance issues.

"Transfer" is checked with a query to the sara database which hosts file names and transfer status values. All CALIB files with transfer status < 6 (meaning not yet in the primary archive) are found, if the delay is more than 1 hr and less than 72 hrs. The one with the longest delay is displayed. If none is found, the "transfer" status is ok, otherwise nok. There is also an indication for delays of files of any type, but this is not used for the nok alert. This is motivated by the fact that for incremental processing, and for the closure of the QC loop with Paranal, CALIB files are by far the most important files. To avoid false alerts, delays by less than 1 hour are not evaluated. Delays by more than 72 hours are disregarded either since it is assumed that these might be due to database inconsistencies. This is not always true but the tool cannot decide this.

The complete query result is displayed upon launching the red action button (line labelled as "longest delay"). The green action button launches the inverse query, all archived files with status 6 and their delay values (time between OLAS archiving on Paranal and in the primary archive in Garching).

Finally, the link to the DataTransfer monitor displays the complete information for all files, plus statistics. The 'total' link relates to the Evalso monitor which is running on Paranal to measure the current Evalso bandwidth from/to Paranal. The link called 'Reuna' measures the bandwidth of the Reuna link (Chile to Europe). Both are useful to monitor the current DTS bandwidth and for analysing transfer issues.

The Evalso monitor is also displayed in the bottom monitor panel called "system".

Data
Transfer:
  Monitors: DataTransfer   | Evalso: total | Reuna
ngas
transfer
no CALIB file delayed by >1 hr

In case of problems, flags will turn red, e.g.:

Data
Transfer:
  Monitors: DataTransfer   | Evalso: total | Reuna
ngas
transfer
longest CALIB delay: VISIR.2008-11-08T08:13:11.123.fits CALIB (2.5 hrs)
longest delay (any dpr.catg): VISIR.2008-11-01T08:13:11.123.fits SCIENCE (54.4 hrs)

The ngas and the transfer flags are exported to the web server and embedded in the calChecker and the HC monitor.


[ top ] autoDaily checkbox (dfos only). This checkbox is intended to make the current status of the processing scheme more transparent. It checks for:

autoDaily?
enabled as cronjob
cleanupRD enabled
dfosCron monitoring enabled

This box must be green for dfos installations. The configured cronjob pattern is visible when hovering the mouse.

The activities of autoDaily are displayed in real-time underneath the XDM. If autoDaily is not running, this box displays:

autoDaily: no dates

If there is autoDaily activity, messages will inform about progress. You can follow the workflow by clicking on the 'log' link:

autoDaily: list_data_dates
log autoDaily running!
calling createAB

[ top ] HC monitor checkbox (dfos only). This checkbox monitors the proper update pattern of HC reports. It checks for the existence and proper scheduling of the following jobs:

HC monitor updates?
JOBS_TREND enabled as cronjob
JOBS_HEALTH existing
JOBS_NAVBAR enabled as cronjob

[ top ] calChecker checkbox (dfos only). The first checkbox checks for the existence and the proper scheduling of the calChecker cronjob (to be called every half hour). The second one checks if once a day the FULL mode is called, as a safety mechanism.