Capturing return values from concurrent mpiexecs

Pete Wyckoff pw at osc.edu
Tue Nov 8 14:15:48 EST 2005


martin.schaffoener at e-technik.uni-magdeburg.de wrote on Tue, 08 Nov 2005 18:09 +0200:
> We have a bunch of perl scripts which help us split the job into small chunks 
> which are then distributed to the job's nodes using fork()/exec(). A separate 
> "mpiexec -server" process is forked, and individual chunks are started using 
> "mpiexec -n 1 -comm none ...". The problem is that even though there are 
> multiple concurrent processes spawned through "mpiexec -n 1", none of their 
> return codes make it through its respective mpiexec process.

What you describe wanting is how I thought it worked now, but
testing shows it does not.  Did it ever work in an earlier version?
Are you using 0.80 now?  I'll play with it and see if I can
understand why it's not working.

> How would it be possible to at least propagate the return value from the 
> "master" process of each concurrent parallel chunk, especially if there is 
> also the infamous "mpiexec -server" running? Also, would there be any way to 
> combine the return values of all processes in a parallel task?

I'd imagine that "mpiexec -server" would always return 0 unless it
died in some unnatural way---it would never return the exit status
of tasks it started for any of its clients.

The combined return value of a parallel task is always just the exit
status of task #0, although there will be warning lines on stderr to
report the non-zero exit statuses of other tasks.  I couldn't come
up with any better way of reporting the array of exit statuses.  Any
suggestions?

		-- Pete


More information about the mpiexec mailing list