Zombie mpiexec processes
Martin Schafföner
martin.schaffoener at e-technik.uni-magdeburg.de
Thu Jan 19 06:54:52 EST 2006
We have a scenario where a list of files is processed in parallel on several
nodes. A master process spawns multiple "mpiexec -n 1 -comm none ..."
processes and waits for them to finish, after which new mpiexec processes can
be spawned until the list is finished. Unfortunately, some of these mpiexec
child processes go zombie, thus never return, and the job does not get
finished.
Does anybody have any idea why these processes might go zombie? We use mpiexec
0.80 linked against torque 2.0.0p5.
Regards,
--
Martin Schafföner
Cognitive Systems Group, Institute of Electronics, Signal Processing and
Communication Technologies, Department of Electrical Engineering,
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063
More information about the mpiexec
mailing list