tm_init: tm: not connected, protocol version 2?
Lisa Thurston
lnthurston at ucdavis.edu
Thu May 29 18:41:28 EDT 2003
Hello
I have installed mpiexec 0.74 on a cluster that is running OpenPBS
4.3.16. I was using PBSPro and was able to get mpiexec to run, but was
not happy with the inability to feed standard input to the parallel
processes. I therefore decided to try OpenPBS. Unfortunately I have
been unable to get mpiexec to run under OpenPBS.
I have read through the list archives and the FAQ in the README, but
have found nothing helpful. I have tried adding and changing various
name resolution files and modifying the clienthosts in the
mom_priv/config file but to no avail.
Any helpful hints at this point would be much appreciated. Various
output follows.
Thanks,
Lisa Thurston
The error message that appears in the job error file is...
mpiexec: Error: get_hosts: tm_init: tm: not connected.
In the mom_log on the node there is the following error...
05/29/2003 11:18:49;0001; pbs_mom;Svr;pbs_mom;Success (0) in
tm_request, bad protocol version 2
I attached strace to my mpiexec process, but I can't see much in the
output. Here are the only error messages...
4974 11:18:49 open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such
file or directory)
4974 11:18:49 connect(3, {sin_family=AF_UNIX,
path="/var/run/.nscd_socket"}, 110) = -1 ENOENT (No such file or
directory)
4974 11:18:49 open("/var/nis/NIS_COLD_START", O_RDONLY) = -1 ENOENT (No
such file or directory)
The end of the strace on mpiexec looks like this...
4974 11:18:49 connect(3, {sin_family=AF_INET, sin_port=htons(15003),
sin_addr=inet_addr("127.0.0.1")}}, 16) = 0
4974 11:18:49 write(3,
"+2+22+22249.myriad.ucdavis.edu2+3248BBD60D21816362573D0916189861453+100+1+1", 75) = 75
4974 11:18:49 select(1024, [3], NULL, NULL, {2592000, 0}) = 1 (in [3],
left {2592000, 0})
4974 11:18:49 read(3, "", 1024) = 0
4974 11:18:49 close(3) = 0
4974 11:18:49 write(2, "mpiexec: Error: ", 16) = 16
4974 11:18:49 write(2, "get_hosts: tm_init", 18) = 18
4974 11:18:49 write(2, ": tm: ", 6) = 6
4974 11:18:49 write(2, "not connected.\n", 15) = 15
4974 11:18:49 _exit(1) = ?
--
_____________________
Lisa Thurston
Systems Administrator
Evolution & Ecology
University of California
1 Shields Ave.
Davis, CA 95616
(530)752-3097
More information about the mpiexec
mailing list