Propogation of SIGTSTP ?

Chris Samuel csamuel at vpac.org
Tue Apr 5 18:37:34 EDT 2005


Hi Pete,

I'm not going to be in the office today, so I can't reply properly until
Thursday, but in the meantime some quick thoughts.

1. It's the MOM that needs patching, rather than qsig from my reading of
   the Torque code.  It sends SIGSTOP when doing its suspend resume stuff,
   other signals will not result in the job being in the suspended state
   and hence its walltime will (probably) continue to tick down.

2. I don't think processes can catch SIGSTOP, so we are limited to TSTP
   in the mom context, however there's nothing stopping mpiexec using
   SIGSTOP to stop the processes it has started at a guess.

3. On the Torque list I did mention that the ideal solution would be for
   the mom's to keep track of all the processes and to signal them
   themselves, but as we know from the SSH based mpirun that's pretty
   much impossible, and that the reason we're using mpiexec is to
   improve all that.

   I'll be interested to hear what their thoughts on that will be.


cheers!
Chris


More information about the mpiexec mailing list