mpiexec - Torque issues

Prakash Velayutham prakash.velayutham at cchmc.org
Tue Nov 29 11:02:11 EST 2005


Pete Wyckoff wrote:
> Prakash.Velayutham at cchmc.org wrote on Thu, 24 Nov 2005 10:47 -0500:
>   
>> I installed OpenPBS again and compiled mpiexec 0.8 against that and 
>> everything works fine again. Would you have some time to get
>> Torque-1.2.6
>> to work with mpiexec? I could help you in whatever way possible like
>> testing etc. I 
>> would really love to get Torque with mpiexec working as newer Torque 
>> releases (starting with 2) have multi-server support which would be very
>> useful.
>>     
>
> Try enabling debugging in mpich, or look at your code to figure out
> what it is doing that is causing it to exit.  Your problem is this
> line:
>
>   
>>>> The error returned is:
>>>> p0_7360:  p4_error: interrupt SIGx: 15
>>>>         
>
> You might run mpiexec with "-v -v", but I don't think it will tell
> you anything.  It starts the jobs and they run for almost four
> minutes until one of your MPI tasks dies as above:
>
>   
>>>> pbs_mom;Job;51025.ribosome.cchmc.org;start_process: task started, tid
>>>>         
>> 2,
>>     
>>>> sid 9858, cmd /bin/sh
>>>> 11/13/2005 13:38:43;0001;   pbs_mom;Job;TMomFinalizeJob3;job
>>>> 51026.ribosome.cchmc.org started, pid = 9904
>>>> 11/13/2005 13:42:10;0008;  
>>>> pbs_mom;Job;51025.ribosome.cchmc.org;kill_task: killing pid 9868 task
>>>>         
>> 2
>>     
>>>> with sig 9
>>>>         
>
> This is quite unlikely a problem with mpiexec as those tend to occur
> during the startup phase, not many minutes later.
>
> 		-- Pete
Pete,

I will get on this as soon as possible. I just have one question. How is 
it that everything works fine with OpenPBS but not with Torque? If it is 
a problem with MPI code, it should show the same signs even within 
OpenPBS I would guess.

Thanks,
Prakash



More information about the mpiexec mailing list