Документ взят из кэша поисковой машины. Адрес оригинального документа : http://www.adass.org/adass/proceedings/adass94/nguyend.ps
Дата изменения: Tue Jun 13 20:51:42 1995
Дата индексирования: Tue Oct 2 01:22:10 2012
Кодировка:
Поисковые слова: дисперсия скоростей

Astronomical Data Analysis Software and Systems IV
ASP Conference Series, Vol. 77, 1995
R. A. Shaw, H. E. Payne, and J. J. E. Hayes, eds.
Simulations of Pinhole Imaging for AXAF: Distributed
Processing Using the MPI Standard
D. Nguyen and B. Hillberg
Smithsonian Astrophysical Observatory, 60 Garden St., Cambridge, MA
02138
Abstract. The pinhole simulation program is a computationally and
memory intensive task. A conventional sequential approach would limit
the size and complexity of such a problem when investigated in the frame
work of the SAO AXAF simulation system. A parallel version was de
veloped instead, to enable distributed processing on a cluster of worksta
tions. The program makes use of the Message Passing Interface (MPI)
standard for parallel processing, implemented as an API (Application
Programming Interface) to the Local Area Multicomputer (LAM) pro
gramming environment developed at the Ohio Supercomputer Center.
1. Introduction
As part of our efforts to support the AXAF program, the SAO AXAF Mission
Support Team has developed a software suite to simulate AXAF images gener
ated by the flight mirror assembly (Jerius 1994). One of the tasks of this system
is to simulate pinhole imaging of the Xray source.
2. Pinhole Simulation in Sequential Mode
The task of the pinhole program is to tabulate the weight of the photons detected
through a pinhole. The weight of a photon represents the probability of finding
the photon at a given position. Photons are generated with weights equal to 1
at the source. Weights are reduced at every reflection point along their paths
toward the detector. The condition for a photon to successfully pass through a
pinhole is:
(x \Gamma x o ) 2 + (y \Gamma y o ) 2 ! r 2
o (1)
where (x,y) is the photon position in the plane of the pinhole, relative to the
pinhole of radius r o centered at (x o ,y o ). In order to simulate a two dimensional
scan at the focal plane for pinholes of various radii, the pinholes are laid out on
a cubic lattice. The pinholes on a rectangular grid simulate a two dimensional
scan, the stack of rectangular grids represents the pinholes of various radii.
A naive approach in writing the pinhole program would require
O(N \Lambda X \Lambda Y \Lambda Z) (2)
operations to finish. N is the number of incident photons, X and Y are the
number of grid sites along the x and y axes of the rectangular grid, and Z is
1

2
the number of radii to be calculated. An efficient program should minimize the
number of times equation (1) has to be executed. The layout of the pinholes
on a cubic lattice is used for this purpose, since in this case the photon stream
only needs to be read once. Further reduction in execution time can be achieved
by realizing that the pinholes of different radii are concentric. In the current
implementation, the program requires
O(N \Lambda Overlap 2 \Lambda log 2 (Z) \Lambda (log 2 (X) + log 2 (Y ))) (3)
operations. N , X , Y , and Z are as defined above, and Overlap is the number
of overlapping pinholes in x and y directions. The drawback of the cubic lat
tice layout approach is that the three dimensional array that holds the photon
weights can be prohibitively large for some class of problems; e.g., a 4000 by
4000 grid size would require 128 MB of memory.
Verification of the software was done as follows: a spatially uniform distri
bution of photons was generated with the weight of each photon set to unity.
The photons were traced to the plane of the pinholes. The number of photons
which make it through a pinhole must be equal to the density of the incident
photon beam times the area of the pinhole.
3. Parallel Processing
These efficiency improvements achieve optimum speed for the pinhole simulation
program running on a given machine. This, however, is still not satisfactory
given the large volume of data that needs to be simulated. The next step was to
consider parallel processing across workstations on a local area network (LAN).
Distributed multicomputing on a networkconnected workstations provides
a costeffective environment for high performance scientific computing. Soft
ware packages exist that support parallel processing on workstation clusters by
managing the communications and data traffic in a way transparent to the ap
plication. These packages provide sets of Application Programming Interfaces
(API) for various languages so that their functions can be called from an ap
plication. Examples of such packages are Express, PARMACS, PVM, and P4.
Although these packages have very similar functionalities, their APIs are very
different. The result of nonstandard APIs is that third party software becomes
specific to a given package, and cannot be used with other packages.
4. The MPI Standard
A standardization process for a message passing system was initiated at the
``Workshop on Standards for Message Passing in a Distributed Memory En
vironment,'' held in Williamsburg, Virginia in 1992 November. The Message
Passing Interface (MPI) forum consists of researchers from government labora
tories, universities, and industry, along with vendors of concurrent computers.
The MPI standard was derived from the best features of its predecessors, rather
than adopting one of the existing systems. The MPI standard includes: Point
topoint communication, Collective operations, Process groups, Communication
context, Process Topologies, Bindings for fortran 77 and C, Environment Man
agement, and Inquiry and Profiling Interfaces.

3
The pinhole program makes use of the MPI standard for parallel processing
implemented as an API to the Local Area Multicomputer (LAM) programming
environment developed at the Ohio Supercomputer Center (Burns 1989). LAM
is a distributed memory MIMD programming and operating environment for
heterogeneous UNIX computers on a network.
5. Pinhole Simulation in Parallel Mode
The pinhole program can be completely parallelized by using data decompo
sition. More than one instance of the program can run, each on a different
machine, analyzing a different subset of the data. One process (the master)
initiates all the other processes (the slaves), and, at the end, collects all the
results. In the masterslave computing paradigm, each slave communicates with
the master, but there is no interslave communication.
The expected speedup from the sequential process for p number of slaves is
(Almasi & Gottlieb, 1994):
speedup = p \Lambda r 1
(r 1
+ p 2 ) (4)
r 1 = T sequential
T communication1
(5)
where T sequential is the time needed on one machine and T communication1 is the
time needed to communicate with one slave. For a given r 1
the maximum
speedup occurs at p = p
r 1
. The maximum speedup is therefore 0:5 p
r 1
.
The pinhole program was modified and executed on a system of one master
and ten slave processes, each running on a different workstation. The results
were consistent with equations (4) and (5), i.e., r 1
was measured to be about
100, so for ten slaves a factor of five in speed was gained.
6. Summary
MPI enables the writing of portable highperformance libraries for distributed
memory machines, easing the burden for application programmers.
References
Almasi, G. & Gottlieb, A. 1994, Highly Parallel Computing, 2nd ed., (Redwood
City, Benjamin/Cummings)
Burns, G. 1989, in Proceedings of the Fourth Conference on HyperCubes, Con
current Computers, and Applications (Los Altos, Golden Gate Enter
prises)
Jerius, D., Freeman, M., Gaetz, T., Hughes, J. P., & Podgorski, W. 1995, this
volume, p. ??
Message Passing Interface Forum 1995, MPI: A message passing Interface Stan
dard, Technical Report (Knoxville, University of Tennessee), in press