FW: Mpiexec intermittent problem
Adams, Brian M
briadam at sandia.gov
Mon Aug 28 16:30:27 EDT 2006
Pete,
Thanks for the suggestions. In this case I am the DAKOTA user (and one
of the newest DAKOTA developers), so I will explore your ideas directly.
A side note: overall the mpiexec tiling capability you introduced for us
in v0.80 has been working wonderfully with DAKOTA and single or
multiprocessor tiled analysis jobs.
I will turn on more verbosity and keep the logfiles around to see if we
can diagnose any better.
The jobs I'm having DAKOTA launch take from 5 to 500 seconds to execute
(and for a particular DAKOTA run are nearly homogeneous in run time), so
I don't know if the race condition you're describing is likely?
I ran contests.pl (from your SVN repo) against the installed version of
mpiexec (Version 0.80+20050801 , configure options:
'--prefix=/apps/mpiexec-cvs' '--with-pbs=/apps/torque'
'--with-default-comm=ib') and against v0.81, configured the same. The
results were a little different and in the 0.80 case, the perl script
fails to exit. I'm attaching a tarball with the output of the tests
with and without "-v -v" -- maybe you'll see something suspicious that
indicates I should run 0.81 to avoid this problem...
Tbird uses torque-2.0.0p8 by default.
Brian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tbird_contests.tgz
Type: application/x-compressed
Size: 34654 bytes
Desc: tbird_contests.tgz
Url : http://email.osc.edu/pipermail/mpiexec/attachments/20060828/2bd39630/tbird_contests-0001.bin
More information about the mpiexec
mailing list