Zombie mpiexec processes

Martin Schafföner martin.schaffoener at e-technik.uni-magdeburg.de
Thu Jan 19 06:54:52 EST 2006


We have a scenario where a list of files is processed in parallel on several 
nodes. A master process spawns multiple "mpiexec -n 1 -comm none ..." 
processes and waits for them to finish, after which new mpiexec processes can 
be spawned until the list is finished. Unfortunately, some of these mpiexec 
child processes go zombie, thus never return, and the job does not get 
finished.

Does anybody have any idea why these processes might go zombie? We use mpiexec 
0.80 linked against torque 2.0.0p5.

Regards,
-- 
Martin Schafföner

Cognitive Systems Group, Institute of Electronics, Signal Processing and 
Communication Technologies, Department of Electrical Engineering, 
Otto-von-Guericke University Magdeburg
Phone: +49 391 6720063


More information about the mpiexec mailing list