mpiexec jobs hang up into sleep state

Milos Milosavljevic milos at astro.as.utexas.edu
Fri Jul 25 12:29:18 EDT 2008


Hi,

I am running mpiexec on RH Enterprise 5 with mpich2-64.  The system  
is a 4 x Quad Core X7350 Xeon.  I run the executable in the  
background as follows (the same hangup occurs when the job is run in  
the foreground with no redirects):

mpiexec -n 16 executable < /dev/null > screen.out &

After several minutes or hours of uninterrupted execution with good  
load balance, the job suddenly hangs up in the sleep state (the  
status of all associated processes in top goes from 'R' to 'S').

I am able to revive the job simply with 'fg  ctrl-z  bg'.  Then the  
job continues for another few minutes or hours, until it hangs up again.

The output of the calculation is correct and unaffected by the hangup  
and revival.  The final results of the calculation look accurate  
regardless of how many times it hangs up and gets revived by hand, so  
the hangup does not seem to be triggered from within the simulation.   
It is also uncorrelated with the disk write sequence, etc.

Thank you in advance for any help with this problem,

Milos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://email.osc.edu/pipermail/mpiexec/attachments/20080725/b4873bca/attachment.htm


More information about the mpiexec mailing list