mpiexec - Torque issues
Pete Wyckoff
pw at osc.edu
Mon Nov 28 10:07:24 EST 2005
Prakash.Velayutham at cchmc.org wrote on Thu, 24 Nov 2005 10:47 -0500:
> I installed OpenPBS again and compiled mpiexec 0.8 against that and
> everything works fine again. Would you have some time to get
> Torque-1.2.6
> to work with mpiexec? I could help you in whatever way possible like
> testing etc. I
> would really love to get Torque with mpiexec working as newer Torque
> releases (starting with 2) have multi-server support which would be very
> useful.
Try enabling debugging in mpich, or look at your code to figure out
what it is doing that is causing it to exit. Your problem is this
line:
> >> The error returned is:
> >> p0_7360: p4_error: interrupt SIGx: 15
You might run mpiexec with "-v -v", but I don't think it will tell
you anything. It starts the jobs and they run for almost four
minutes until one of your MPI tasks dies as above:
> >> pbs_mom;Job;51025.ribosome.cchmc.org;start_process: task started, tid
> 2,
> >> sid 9858, cmd /bin/sh
> >> 11/13/2005 13:38:43;0001; pbs_mom;Job;TMomFinalizeJob3;job
> >> 51026.ribosome.cchmc.org started, pid = 9904
> >> 11/13/2005 13:42:10;0008;
> >> pbs_mom;Job;51025.ribosome.cchmc.org;kill_task: killing pid 9868 task
> 2
> >> with sig 9
This is quite unlikely a problem with mpiexec as those tend to occur
during the startup phase, not many minutes later.
-- Pete
More information about the mpiexec
mailing list