get_hosts problem on mpiexec-0.80 with torque-1.2.0p6

Tatsuya Minami taminami at mac.com
Thu Oct 6 21:01:05 EDT 2005


I did not quite understand about PBSDEBUG part. Did you mean that I 
could set PBSDEBUG=yes in the pbs script in front of mpiexec?

Anyway, I set PBSDEBUG=yes and PBSLOGLEVEL=7 and restarted all the 
daemons. However, I didn't get any information with regard to the 
mpiexec's error in their logs.

My guess is that I am running pbs_server and pbs_sched on a machine 
which serves as a NAT gateway. Since the pbs_server listens to both of 
the WAN side and the LAN side, pbs somehow works without problem. But, 
I guess it is not the case for mpiexec. I can set the server name by 
the server_name file, but doing so triggers some other problem, for 
example pbs_server and pbs_sched can not communicate each other and so 
on. I could not make pbs_sched listen to the LAN side at all.

What I can try is to run pbs_server and pbs_sched on a machine behind 
NAT instead of on the gateway. Otherwise I can't think of any solution.

Tatsuya

On Wednesday, October 5, 2005, at 05:48  PM, Pete Wyckoff wrote:

> taminami at mac.com wrote on Wed, 05 Oct 2005 16:51 -0400:
>> The job exits immediately and gives me the following error+output:
>>
>> mpiexec: resolve_exe: prefixing dot to executable: "./s3d_dms_fftw".
>> mpiexec: Error: get_hosts: pbs_connect: no error.
>>
>> I have learned that the second line indicates MPI could not resolve 
>> hostnames used in PBS, from the archive of this mailing list. 
>> However, I couldn't get any more information about this.
>>
>> Is there any way for me to know in what machine (in server or mom) 
>> and which hostname MPI tried to resolve and failed?
>
> I know very little about torque, but scanning their code points out
> an environment variable you may be able to use to get more debugging
> information.  Try (in bash-speak):
>
>     PBSDEBUG=yup mpiexec s3d....
>
> But likely your guess is correct.  In which case the traditional
> advice for PBS server name is that it looks first at environment
> variable PBS_DEFAULT.  But most systems use an installed file to
> hold the default server name.  In torque this appears to be
> <serverhome>/server_name.  And <serverhome> defaults to
> /usr/spool/PBS.  There are ./configure variables during the build to
> change these things too.
>
> Let us know what fixes it for you.
>
> 		-- Pete
>
Tatsuya Minami
taminami at mac.com



More information about the mpiexec mailing list