Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.atnf.csiro.au/vlbi/dokuwiki/lib/exe/fetch.php/difx/meetings/sydney2012/difxwithsge.pdf
Дата изменения: Tue Sep 25 04:15:52 2012
Дата индексирования: Tue Apr 12 12:42:14 2016
Кодировка:

Поисковые слова: hst
Running DiFX with SGE/OGE
Max-Planck-Institut fЭr Radioastronomie Bonn, Germany

Helge Rottmann

DiFX Meeting 24.9. - 28.9.2012 Sydney




SGE = Sun Grid Engine Now Oracle Grid Engine (= OGE) but lots of online resources still refer to SGE. OGE is a Distributed Resource Management (DRM) System
Goal: maximize resource utilization by matching incoming workload to available resources. Read more:
http://docs.oracle.com/cd/E24901_01/index.htm http://www.oracle.com/technetwork/oem/host-server-mgmt/twp-gridengine-beginner-167116.pdf




1. User submits a job to the master host 2. Master host schedules the job according to requested and available resources 3. Master host assigns the job to one or more execution hosts 4. Execution hosts execute the job
Users can: ·specify minimum requirements (e.g number of nodes, memory available etc.) ·assign the start time of the job ·Assign job priorities (requires right to do so)




There are numerous DRM systems available (e.g. Torque) OGE (at least theoretically) meets all DiFX requirements OGE is very well documented OGE is very simple to install (part of RHEL and many other distributions) OGE is freely available






Submit "simple" (non-parallelized) jobs qsub myjobscript!

!qsub ­l m_core=8 myjobscript!
Example myjobscript:

#!/bin/csh! #$ -M user@mpifr-bonn.mpg.de ! #$ -o flow.out -j y! cd TEST! f77 flow.f -o flow!

Write an email when job starts / finishes Redirect standard output And standard error to file

Commands to execute




Submit "advanced" (parallel) jobs

!qsub ­pe parallel_environment slots myjobscript!


Administrator provides a "parallel environment" for user submission of jobs. Administrator can set various contraints (e.g. which nodes to make available and many others) OGE tightly integrates with the most common parallelization frameworks e.g. OpenMPI






qsub ­pe difxpe 4 myjobscript!
Example myjobscript:

#!/bin/csh! mpirun hostname!


OGE chooses 4 execution nodes to start the job on. No machine file needed.!




Maximize cluster utilization. Use the cluster for other projects when not correlating. At MPIfR cluster is regularly used for:


Pulsar search Numerical simulations of jets FPGA routing

Utilization of Bonn cluster is only 20% most of the time. Pulsar people would like to consume DiFX resources when available.



Simultaneously run multiple DiFX correlations Schedule execution of multiple correlation runs Out-of the box suspend/restart facilities Maybe you must correlate at an eternal computing facility.




OGE must support DiFX threaded operations. Never start more than one job/node. OGE must obey special DiFX machine file order (1. head node, 2. datastream nodes, 3.compute nodes) Operational requirement: Must allow immediate execution of DiFX jobs even if other jobs are running on the nodes.








DiFX typically starts N-1 threads on each node (N= number of cores). To prevent overbooking it is necessary to tell OGE never to start more than one DiFX process per node. Non-DiFX jobs should be allowed to start more than one process /node OGE lets you define resource quotas e.g:
!limit users {difx, oper} to slots=1!



or !limit projects {difx} to slots=1!




OGE "tight" integration with OpenMPI is convenient, but does not allow the user to influence e.g. node selection. DiFX requires special node order: head node first, then datastream nodes (fixed order in case of Mark5 units), then compute nodes. OGE provides a hook to loosen up the tight integration. Parallel environment can execute a script every time a job is submitted to it. This script genmachines.oge can produce a custom machine file. Make use of "loose" integration by providing the custom machine file to openmpi.








Master host
qsub ­pe difx 20 startdifx.oge!

fxmanager mark5fx01 mark5fx08 node21 node35

PE difx
genmachines.oge machine.job

startdifx.oge

#!/bin/csh! mpirun ­machinefile = $TMPDIR/machine runmpifxcorr.d21!

...




We happily share correlator resources with other projects....
....but when correlation should be scheduled in executed immediately



OGE provides the concept of hierarchical queues:


A queue is a group of resources (e.g. nodes) that jobs can be submitted to A node can belong to multiple queues Queues can be subordinate to other queues Jobs on a subordinate queue get suspended automatically if a job is submitted to its master.


B: subordinate queue

Scenario 2: · a 6 process job is submitted to queue B · a 8 process job is submitted to queue A Queue B processes running on nodes 6, 7 and 8 get suspended => When job in queue A finishes suspended processes automatically resume

A: master queue

Scenario 1: · a 6 process job is submitted to queue B · a 5 process job is submitted to queue A => Jobs can run concurrently

node01 node02 node03 node04 node05 node06 node07 node08 node09 node10 node11




Proof of principle that DiFX can be run from within OGE has been done at MPIfR Various things to be explored:




How do suspended processes on mark5 units behave Explore queue setups that reflect workflow on the Bonn cluster Review requirements of non-DiFX cluster users ....



Things to be done:


Write production versions of genmachines.oge and startdifx.oge