pbs_connect error when running mpiexec jobs with PBS
Piotr Siwczak
psiwczak at man.poznan.pl
Mon Feb 7 10:18:42 EST 2005
Hi,
I would like to thank You for fast response at first.
The problem was indeed in server_name not being correct. Unfortunately the
file server_name is listed as "server parameter" in openPBS manual, that's
why I had problems figuring it out. I also examined strace output from
my qsub and it showed mpiexec accessing the server_name.
Cheers,
Piotr
On Mon, 7 Feb 2005, Pete Wyckoff wrote:
> psiwczak at man.poznan.pl wrote on Mon, 07 Feb 2005 09:20 +0100:
>> Recently I've been experiencing a strange behaviour from my 'pbs-enabled'
>> mpiexec. All mpi jobs quit with the following information:
>>
>> mpiexec: Error: get_hosts: pbs_connect: Access from host not allowed, or
>> unknown host
>>
>> However, in logs I can see that the pbs scheduler accepts submitted job
>> and sends it to a mom at one of my cluster nodes. Having been processed on
>> pbs_mom, job exits with error status=1.
>
> The compute node that is trying to run mpiexec cannot talk to the PBS
> server. Most likely the name did not resolve (the name in the server_name
> file in the PBS /var/... directory) on the compute node. You might fix
> the server_name file or edit /etc/hosts to have an entry for the server.
>
> You might type "qstat" in your batch job on the compute node and see if
> it has the same problem as does mpiexec.
>
>> On the other hand, jobs submitted with mpirun (mpich2) outside pbs work
>> perfectly.
>
> One major difference: mpirun doesn't talk to PBS from the compute node.
>
> -- Pete
>
More information about the mpiexec
mailing list