Propogation of SIGTSTP ?
Chris Samuel
csamuel at vpac.org
Tue Apr 5 18:37:34 EDT 2005
Hi Pete,
I'm not going to be in the office today, so I can't reply properly until
Thursday, but in the meantime some quick thoughts.
1. It's the MOM that needs patching, rather than qsig from my reading of
the Torque code. It sends SIGSTOP when doing its suspend resume stuff,
other signals will not result in the job being in the suspended state
and hence its walltime will (probably) continue to tick down.
2. I don't think processes can catch SIGSTOP, so we are limited to TSTP
in the mom context, however there's nothing stopping mpiexec using
SIGSTOP to stop the processes it has started at a guess.
3. On the Torque list I did mention that the ideal solution would be for
the mom's to keep track of all the processes and to signal them
themselves, but as we know from the SSH based mpirun that's pretty
much impossible, and that the reason we're using mpiexec is to
improve all that.
I'll be interested to hear what their thoughts on that will be.
cheers!
Chris
More information about the mpiexec
mailing list