tm_init: tm: not connected, protocol version 2?
Lisa Thurston
lnthurston at ucdavis.edu
Thu Jun 5 19:57:44 EDT 2003
Thanks very much for the hint. I recompiled and now do not have that
problem. Unfortunately I seem to have just traded up for a different
connection problem. I've tried to resolve it myself but once again must
admit defeat. I would greatly appreciate any advice you have to offer.
Lisa Thurston
The output of the job looks like this...
This jobs runs on the following processors:
node5 node5 node4 node4 node3 node3 node2 node2
Process 4485 attached
pbs_iff: cannot connect to host
Process 4485 detached
mpiexec: Error: get_hosts: pbs_connect: Unauthorized Request .
The mom_log on node5 is not very informative...
06/05/2003 11:40:44;0100; pbs_mom;Req;;Type 1 request received from
PBS_Server at myriad.ucdavis.edu, sock=10
06/05/2003 11:40:44;0100; pbs_mom;Req;;Type 3 request received from
PBS_Server at myriad.ucdavis.edu, sock=10
06/05/2003 11:40:44;0100; pbs_mom;Req;;Type 4 request received from
PBS_Server at myriad.ucdavis.edu, sock=10
06/05/2003 11:40:44;0100; pbs_mom;Req;;Type 5 request received from
PBS_Server at myriad.ucdavis.edu, sock=10
06/05/2003 11:40:44;0100; pbs_mom;Req;;Type 19 request received from
PBS_Server at myriad.ucdavis.edu, sock=10
06/05/2003 11:40:44;0008; pbs_mom;Job;3731.myriad.ucdavis.edu;Started,
pid = 4443
06/05/2003 11:40:44;0080; pbs_mom;Job;3731.myriad.ucdavis.edu;task 1
terminated
06/05/2003 11:40:44;0008;
pbs_mom;Job;3731.myriad.ucdavis.edu;Terminated
06/05/2003 11:40:44;0008; pbs_mom;Job;3731.myriad.ucdavis.edu;kill_job
06/05/2003 11:40:44;0080; pbs_mom;Job;3731.myriad.ucdavis.edu;Obit
sent
06/05/2003 11:40:44;0100; pbs_mom;Req;;Type 6 request received from
PBS_Server at myriad.ucdavis.edu, sock=10
The end of the strace shows...
4485 11:40:44 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
4485 11:40:44 bind(4, {sin_family=AF_INET, sin_port=htons(1023),
sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EACCES (Permission denied)
4485 11:40:44 close(4) = 0
4485 11:40:44 write(2, "pbs_iff: cannot connect to host\n", 32) = 32
4485 11:40:44 _exit(4) = ?
4484 11:40:44 <... read resumed> "", 4) = 0
4484 11:40:44 close(3) = 0
4484 11:40:44 write(2, "mpiexec: Error: ", 16) = 16
4484 11:40:44 --- SIGCHLD (Child exited) ---
4484 11:40:44 write(2, "get_hosts: pbs_connect", 22) = 22
4484 11:40:44 write(2, ": Unauthorized Request .\n", 25) = 25
4484 11:40:44 _exit(1) = ?
And, sadly, in the server log...
06/05/2003 11:40:44;0040;PBS_Server;Svr;myriad.ucdavis.edu;Scheduler
sent command 4
06/05/2003 11:40:44;0100;PBS_Server;Req;;Type 56 request received from
pbs_mom at node5, sock=9
06/05/2003
11:40:44;0010;PBS_Server;Job;3731.myriad.ucdavis.edu;Exit_status=0
resources_used.cput=00:00:00 resources_used.mem=312kb
resources_used.vmem=1396kb resources_used.walltime=00:00:00
06/05/2003
11:40:44;0100;PBS_Server;Job;3731.myriad.ucdavis.edu;dequeuing from
internalq, state 5
Running the command (on node5)
pbs_iff -t myriad.ucdavis.edu 15001
is successful.
On Thu, 2003-05-29 at 17:29, Pete Wyckoff wrote:
> lnthurston at ucdavis.edu said on Thu, 29 May 2003 15:41 -0700:
> > I have installed mpiexec 0.74 on a cluster that is running OpenPBS
> > 4.3.16. I was using PBSPro and was able to get mpiexec to run, but was
> > not happy with the inability to feed standard input to the parallel
> > processes. I therefore decided to try OpenPBS. Unfortunately I have
> > been unable to get mpiexec to run under OpenPBS.
> >
> > In the mom_log on the node there is the following error...
> >
> > 05/29/2003 11:18:49;0001; pbs_mom;Svr;pbs_mom;Success (0) in
> > tm_request, bad protocol version 2
>
> A crucial bit of information there. This is one of the pbs_mom
> processes on a compute node saying that something tried to talk to it
> using the wrong TM version. My openpbs 2.3.16-ish source tree says that
> the version should be 1, not 2, hence it appears that you may be running
> the OpenPBS mom.
>
> But, could you have linked mpiexec against -lpbs from your PBSPro
> distribution instead of from your OpenPBS version? If you don't tell
> configure specifically, it tries to use /usr/local/pbs/lib/libpbs.a.
> Add "--with-pbs=/where/ever" to the configure invocation line to change
> that.
>
> -- Pete
--
Lisa Thurston <lnthurston at ucdavis.edu>
More information about the mpiexec
mailing list