FW: Mpiexec intermittent problem

Adams, Brian M briadam at sandia.gov
Mon Aug 28 16:30:27 EDT 2006


Pete, 

Thanks for the suggestions.  In this case I am the DAKOTA user (and one
of the newest DAKOTA developers), so I will explore your ideas directly.
A side note: overall the mpiexec tiling capability you introduced for us
in v0.80 has been working wonderfully with DAKOTA and single or
multiprocessor tiled analysis jobs.

I will turn on more verbosity and keep the logfiles around to see if we
can diagnose any better.

The jobs I'm having DAKOTA launch take from 5 to 500 seconds to execute
(and for a particular DAKOTA run are nearly homogeneous in run time), so
I don't know if the race condition you're describing is likely?

I ran contests.pl (from your SVN repo) against the installed version of
mpiexec (Version 0.80+20050801 , configure options:
'--prefix=/apps/mpiexec-cvs' '--with-pbs=/apps/torque'
'--with-default-comm=ib') and against v0.81, configured the same.  The
results were a little different and in the 0.80 case, the perl script
fails to exit.  I'm attaching a tarball with the output of the tests
with and without "-v -v" -- maybe you'll see something suspicious that
indicates I should run 0.81 to avoid this problem...

Tbird uses torque-2.0.0p8 by default.

Brian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tbird_contests.tgz
Type: application/x-compressed
Size: 34654 bytes
Desc: tbird_contests.tgz
Url : http://email.osc.edu/pipermail/mpiexec/attachments/20060828/2bd39630/tbird_contests-0001.bin


More information about the mpiexec mailing list