get_hosts problem on mpiexec-0.80 with torque-1.2.0p6
Tatsuya Minami
taminami at mac.com
Thu Oct 6 21:01:05 EDT 2005
I did not quite understand about PBSDEBUG part. Did you mean that I
could set PBSDEBUG=yes in the pbs script in front of mpiexec?
Anyway, I set PBSDEBUG=yes and PBSLOGLEVEL=7 and restarted all the
daemons. However, I didn't get any information with regard to the
mpiexec's error in their logs.
My guess is that I am running pbs_server and pbs_sched on a machine
which serves as a NAT gateway. Since the pbs_server listens to both of
the WAN side and the LAN side, pbs somehow works without problem. But,
I guess it is not the case for mpiexec. I can set the server name by
the server_name file, but doing so triggers some other problem, for
example pbs_server and pbs_sched can not communicate each other and so
on. I could not make pbs_sched listen to the LAN side at all.
What I can try is to run pbs_server and pbs_sched on a machine behind
NAT instead of on the gateway. Otherwise I can't think of any solution.
Tatsuya
On Wednesday, October 5, 2005, at 05:48 PM, Pete Wyckoff wrote:
> taminami at mac.com wrote on Wed, 05 Oct 2005 16:51 -0400:
>> The job exits immediately and gives me the following error+output:
>>
>> mpiexec: resolve_exe: prefixing dot to executable: "./s3d_dms_fftw".
>> mpiexec: Error: get_hosts: pbs_connect: no error.
>>
>> I have learned that the second line indicates MPI could not resolve
>> hostnames used in PBS, from the archive of this mailing list.
>> However, I couldn't get any more information about this.
>>
>> Is there any way for me to know in what machine (in server or mom)
>> and which hostname MPI tried to resolve and failed?
>
> I know very little about torque, but scanning their code points out
> an environment variable you may be able to use to get more debugging
> information. Try (in bash-speak):
>
> PBSDEBUG=yup mpiexec s3d....
>
> But likely your guess is correct. In which case the traditional
> advice for PBS server name is that it looks first at environment
> variable PBS_DEFAULT. But most systems use an installed file to
> hold the default server name. In torque this appears to be
> <serverhome>/server_name. And <serverhome> defaults to
> /usr/spool/PBS. There are ./configure variables during the build to
> change these things too.
>
> Let us know what fixes it for you.
>
> -- Pete
>
Tatsuya Minami
taminami at mac.com
More information about the mpiexec
mailing list