Trying to compile

Denis denismpa at gmail.com
Wed Mar 19 11:15:19 EDT 2008


D

2008/3/17, Pete Wyckoff <pw at osc.edu>:
> denismpa at gmail.com wrote on Mon, 17 Mar 2008 09:30 -0300:
> > -bash-3.00$ qsub -I
> > qsub: waiting for job 3459.cromo.ufabc.edu.br to start
> > qsub: job 3459.cromo.ufabc.edu.br ready
> > -bash-3.00$ cd downloads/mpiexec-0.83
> > -bash-3.00$ ./mpiexec  -n 1 hello
> > mpiexec: Error: get_hosts: pbs_connect: No server specified.
> > -bash-3.00$ qstat
>
> Okay, that works.  Easy first check.
>
> Make sure you linked with the right libraries:
>
>    ldd $(which qstat)
>    ldd ./mpiexec
>
> Then try torque debugging:
>
>    PBSDEBUG=yup ./mpiexec -n 1 hello
>
> Then try the big hammer:
>
>    strace -vFf -s 200 -o /tmp/mp-st.out ./mpiexec -n 1 hello
>
> And see if you find anything suspicious in there.
>
> If you happen to know your server name, you should be able to do:
>
>    PBS_DEFAULT=cromo.ufabc.edu.br ./mpiexec -n 1 hello
>
> But I'm still guessing that your server default file is missing.
> Look in the strace for paths it opens.  The one ending in
> "/server_name" is what you want to investigate.
>
>                -- Pete
>

First of all: Pete, thank you very much for your time.


I have done this step and thought that I did not have the correct
libraries linked.
let me show:

here is my ldd answer for qstat:

-bash-3.00$ ldd $(which qstat)
        libtorque.so.0 => /opt/torque/lib64/libtorque.so.0 (0x0000002a95557000)
        libtkx8.3.so => /usr/lib64/libtkx8.3.so (0x000000333aa00000)
        libtk8.4.so => /usr/lib64/libtk8.4.so (0x000000333a600000)
        libX11.so.6 => /usr/X11R6/lib64/libX11.so.6 (0x000000333a800000)
        libtclx8.3.so => /usr/lib64/libtclx8.3.so (0x000000333a400000)
        libtcl8.4.so => /usr/lib64/libtcl8.4.so (0x000000333a200000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003339e00000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x000000333a000000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003339b00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003339900000)

and my ldd answer for mpiexec:

-bash-3.00$ ldd ./mpiexec
        libm.so.6 => /lib64/tls/libm.so.6 (0x000000333a000000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000333ac00000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003339b00000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003339e00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003339900000)

I had compiled mpiexec with these options:
Version 0.83, configure options:  '--with-default-comm=mpich-p4'
'--with-pbs=/usr/local' 'CC=/opt/intel/cce/9.1.047/bin/icc'

It was not working, then I tried these configure options:

Version 0.83, configure options:  '--with-default-comm=mpich-p4'
'--with-pbs=/opt/torque' 'CC=/opt/intel/cce/9.1.047/bin/icc'
'LDFLAGS=-L/opt/torque/lib64 -L/usr/lib64 -L/lib64'

but the ldd output have not changed.

I have ran the strace, and I have seen it looking for a file called
/var/spool/torque/server_name although the pbs qstat look for
/opt/torque/mom_priv/default.

Ok, running the
PBSDEBUG=yup ./mpiexec -n 1 hello
command before creating the server_default I was getting this error:

-bash-3.00$ PBSDEBUG=yup ./mpiexec -n 1 hello
ALERT:  PBS_get_server() failed
mpiexec: Error: get_hosts: pbs_connect: No server specified.

After creating that file(server_name) contaning just a line with the
server's name:
cromo.local

I am getting this error:

-bash-3.00$ PBSDEBUG=yup ./mpiexec -n 1 hello
ALERT:  cannot verify file '2T', errno=2 (No such file or directory)
ERROR:  cannot authenticate connection, errno=2 (No such file or directory)
mpiexec: Error: get_hosts: pbs_connect: Unauthorized Request .


I have noticed that within the strace output, there are a lot of
errors like "missing file" as follows
22850 open("/usr/local/lib/tls/x86_64/libm.so.6", O_RDONLY) = -1
ENOENT (No such file or directory)
22850 stat("/usr/local/lib/tls/x86_64", 0x7fbfffe520) = -1 ENOENT (No
such file or directory)
22850 open("/usr/local/lib/tls/libm.so.6", O_RDONLY) = -1 ENOENT (No
such file or directory)
22850 stat("/usr/local/lib/tls", 0x7fbfffe520) = -1 ENOENT (No such
file or directory)
22850 open("/usr/local/lib/x86_64/libm.so.6", O_RDONLY) = -1 ENOENT
(No such file or directory)
...(and a lot more)

Do you have any another hint?
I just realized that I forgot to say I am using Maui scheduller. Could
it be a problem?


Thank you again.

My best regards,


Denis Anjos.


-- 
Denis Anjos.
Cisco Certified Network Associate.
Universidade Federal do ABC
Santo André - SP - BR


More information about the mpiexec mailing list