Capturing return values from concurrent mpiexecs
Pete Wyckoff
pw at osc.edu
Tue Nov 8 14:15:48 EST 2005
martin.schaffoener at e-technik.uni-magdeburg.de wrote on Tue, 08 Nov 2005 18:09 +0200:
> We have a bunch of perl scripts which help us split the job into small chunks
> which are then distributed to the job's nodes using fork()/exec(). A separate
> "mpiexec -server" process is forked, and individual chunks are started using
> "mpiexec -n 1 -comm none ...". The problem is that even though there are
> multiple concurrent processes spawned through "mpiexec -n 1", none of their
> return codes make it through its respective mpiexec process.
What you describe wanting is how I thought it worked now, but
testing shows it does not. Did it ever work in an earlier version?
Are you using 0.80 now? I'll play with it and see if I can
understand why it's not working.
> How would it be possible to at least propagate the return value from the
> "master" process of each concurrent parallel chunk, especially if there is
> also the infamous "mpiexec -server" running? Also, would there be any way to
> combine the return values of all processes in a parallel task?
I'd imagine that "mpiexec -server" would always return 0 unless it
died in some unnatural way---it would never return the exit status
of tasks it started for any of its clients.
The combined return value of a parallel task is always just the exit
status of task #0, although there will be warning lines on stderr to
report the non-zero exit statuses of other tasks. I couldn't come
up with any better way of reporting the array of exit statuses. Any
suggestions?
-- Pete
More information about the mpiexec
mailing list