Programs fail after starting
Up: In case of trouble Next: General Previous: Workstation Networks
Up: In case of trouble Next: General Previous: Workstation Networks
General
Up: Programs fail after starting Next: HPUX Previous: Programs fail after starting
-
1. Q:
I use MPI_Allreduce, and I get different answers depending on
the number of processes I'm using.
A:
The MPI collective routines may make use of associativity to achieve better
parallelism. For example, an
MPI_Allreduce( &in, &out, MPI_DOUBLE, 1, ... );might compute
or it might compute
where are the values of in on each of eight processes. These expressions are equivalent for integers, reals, and other familar objects from mathematics but are not equivalent for fixed precision datatypes used in computers. The association that MPI uses will depend on the number of processes, thus, you may not get exactly the same result when you use different numbers of processes. Note that you are not getting a wrong result, just a different one (most programs assume the arithmetic operations are associative).
2. Q:
I get the message
No more memory for storing unexpected messageswhen running my program.
A: mpich has been configured to ``aggressively'' deliver messages. This is appropriate for certain types of parallel programs, and can deliver higher performance. However, it can cause applications to run out of memory when messages are delivered faster than they are processed. The MPICH implementation does attempt to control such memory usage, but there are still a few more steps to take in the MPICH implementation. As a work-around, you can introduce synchronous sends or barriers into your code. The need for these will be eliminated in a future MPICH release.
3. Q:
My Fortran program fails with a BUS error.
A: The C compiler that MPICH was built with and the Fortran compiler that you are using have different alignment rules for things like DOUBLE PRECISION. For example, the GNU C compiler gcc may assume that all doubles are aligned on eight-byte boundaries, but the Fortran language requires only that DOUBLE PRECISION align with INTEGERs, which may be four-byte aligned.
There is no good fix. Consider rebuilding MPICH with a C compiler that supports weaker data alignment rules. Some Fortran compilers will allow you to force eight-byte alignment for DOUBLE PRECISION (for example, -dalign or -f on some Sun Fortran compilers); note though that this may break some correct Fortran programs that exploit Fortran's storage association rules.
Some versions of gcc may support -munaligned-doubles; MPICH should be rebuilt with this option if you are using gcc, version 2.7 or later.
4. Q: I'm using fork to create a new process, or I'm creating a new
thread, and my code fails.
A: The MPICH implementation is not thread safe and does not support either fork or the creation of new processes. Note that the MPI specification is intended to be thread safe, but implementations are not required to be so. At this writing, few implementations are thread-safe, primarily because this reduces the performance of the MPI implementation (you at least need to check to see if you need a thread lock, actually getting and releaseing the lock is even more expensive).
Up: Programs fail after starting Next: HPUX Previous: Programs fail after starting
HPUX
Up: Programs fail after starting Next: Trouble with Input and Output Previous: General
-
1.
Up: Programs fail after starting Next: Trouble with Input and Output Previous: General