Trying to compile
Denis
denismpa at gmail.com
Wed Mar 19 11:15:19 EDT 2008
D
2008/3/17, Pete Wyckoff <pw at osc.edu>:
> denismpa at gmail.com wrote on Mon, 17 Mar 2008 09:30 -0300:
> > -bash-3.00$ qsub -I
> > qsub: waiting for job 3459.cromo.ufabc.edu.br to start
> > qsub: job 3459.cromo.ufabc.edu.br ready
> > -bash-3.00$ cd downloads/mpiexec-0.83
> > -bash-3.00$ ./mpiexec -n 1 hello
> > mpiexec: Error: get_hosts: pbs_connect: No server specified.
> > -bash-3.00$ qstat
>
> Okay, that works. Easy first check.
>
> Make sure you linked with the right libraries:
>
> ldd $(which qstat)
> ldd ./mpiexec
>
> Then try torque debugging:
>
> PBSDEBUG=yup ./mpiexec -n 1 hello
>
> Then try the big hammer:
>
> strace -vFf -s 200 -o /tmp/mp-st.out ./mpiexec -n 1 hello
>
> And see if you find anything suspicious in there.
>
> If you happen to know your server name, you should be able to do:
>
> PBS_DEFAULT=cromo.ufabc.edu.br ./mpiexec -n 1 hello
>
> But I'm still guessing that your server default file is missing.
> Look in the strace for paths it opens. The one ending in
> "/server_name" is what you want to investigate.
>
> -- Pete
>
First of all: Pete, thank you very much for your time.
I have done this step and thought that I did not have the correct
libraries linked.
let me show:
here is my ldd answer for qstat:
-bash-3.00$ ldd $(which qstat)
libtorque.so.0 => /opt/torque/lib64/libtorque.so.0 (0x0000002a95557000)
libtkx8.3.so => /usr/lib64/libtkx8.3.so (0x000000333aa00000)
libtk8.4.so => /usr/lib64/libtk8.4.so (0x000000333a600000)
libX11.so.6 => /usr/X11R6/lib64/libX11.so.6 (0x000000333a800000)
libtclx8.3.so => /usr/lib64/libtclx8.3.so (0x000000333a400000)
libtcl8.4.so => /usr/lib64/libtcl8.4.so (0x000000333a200000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003339e00000)
libm.so.6 => /lib64/tls/libm.so.6 (0x000000333a000000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003339b00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003339900000)
and my ldd answer for mpiexec:
-bash-3.00$ ldd ./mpiexec
libm.so.6 => /lib64/tls/libm.so.6 (0x000000333a000000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000333ac00000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003339b00000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003339e00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003339900000)
I had compiled mpiexec with these options:
Version 0.83, configure options: '--with-default-comm=mpich-p4'
'--with-pbs=/usr/local' 'CC=/opt/intel/cce/9.1.047/bin/icc'
It was not working, then I tried these configure options:
Version 0.83, configure options: '--with-default-comm=mpich-p4'
'--with-pbs=/opt/torque' 'CC=/opt/intel/cce/9.1.047/bin/icc'
'LDFLAGS=-L/opt/torque/lib64 -L/usr/lib64 -L/lib64'
but the ldd output have not changed.
I have ran the strace, and I have seen it looking for a file called
/var/spool/torque/server_name although the pbs qstat look for
/opt/torque/mom_priv/default.
Ok, running the
PBSDEBUG=yup ./mpiexec -n 1 hello
command before creating the server_default I was getting this error:
-bash-3.00$ PBSDEBUG=yup ./mpiexec -n 1 hello
ALERT: PBS_get_server() failed
mpiexec: Error: get_hosts: pbs_connect: No server specified.
After creating that file(server_name) contaning just a line with the
server's name:
cromo.local
I am getting this error:
-bash-3.00$ PBSDEBUG=yup ./mpiexec -n 1 hello
ALERT: cannot verify file '2T', errno=2 (No such file or directory)
ERROR: cannot authenticate connection, errno=2 (No such file or directory)
mpiexec: Error: get_hosts: pbs_connect: Unauthorized Request .
I have noticed that within the strace output, there are a lot of
errors like "missing file" as follows
22850 open("/usr/local/lib/tls/x86_64/libm.so.6", O_RDONLY) = -1
ENOENT (No such file or directory)
22850 stat("/usr/local/lib/tls/x86_64", 0x7fbfffe520) = -1 ENOENT (No
such file or directory)
22850 open("/usr/local/lib/tls/libm.so.6", O_RDONLY) = -1 ENOENT (No
such file or directory)
22850 stat("/usr/local/lib/tls", 0x7fbfffe520) = -1 ENOENT (No such
file or directory)
22850 open("/usr/local/lib/x86_64/libm.so.6", O_RDONLY) = -1 ENOENT
(No such file or directory)
...(and a lot more)
Do you have any another hint?
I just realized that I forgot to say I am using Maui scheduller. Could
it be a problem?
Thank you again.
My best regards,
Denis Anjos.
--
Denis Anjos.
Cisco Certified Network Associate.
Universidade Federal do ABC
Santo André - SP - BR
More information about the mpiexec
mailing list