Problem with pbs node configuration
Pete Wyckoff
pw at osc.edu
Mon Aug 4 10:46:07 EDT 2008
gpadoan at inogs.it wrote on Mon, 04 Aug 2008 15:04 +0200:
> I am testing mpiexec 0.83 on my little cluster (10 nodes with Intel Quad-Core plus an front-end Intel Dual-Core).
>
> On it I installed:
>
> Torque-2.4.0-snap.200804241119
> Maui-3.2.6p20-snap.1182974819
> Mpich2-1.0.7
>
> and, now, they work fine.
>
>
>
> I have compiled mpiexec with:
>
> ./configure --prefix=/opt/mpich/mpiexec-0.83 --with-pbs=/opt/torque/2.4.0/ --disable-mpich-gm --with-default-comm=mpich2 \
> --disable-mpich-ib --disable-mpich-rai --disable-lam --disable-shmem --disable-emp --disable-portals \
> --with-mpicc=/opt/mpich/2-1.0.7/bin/mpicc --with-mpif77=/opt/mpich/2-1.0.7/bin/mpif77 --with-sed=/bin/sed
>
>
> If I try for mpiexec (Mpiexec-0.83) I get the following.
>
>
> When I run the job:
>
> #PBS -l nodes=2:ppn=2
> qsub -q batch ./go_mpiexec
>
> where script "go_mpiexec" is:
>
> /opt/mpich/mpiexec-0.83/bin/mpiexec -verbose --comm=mpich2 /home/giorgio/job/test/wave_mpi/Bstagj2mpi
>
>
> Ouput file of Mpich2 logs:
>
> node 0: name <my-name>, cpu avail 1
> ATT: must use at least 2 CPUs
This is the code apparently complaining that MPI_Comm_size()
returned 1. Mpiexec should have started 4 according to your PBS
submission.
First thing to check is to look at the mpiexec logs. That
"-verbose" should have said something. You can add one more "-v" to
get even more information. We need to make sure it knows there are
four available processors.
Also, do "qstat -F $PBS_JOBID" inside the go_mpiexec script to
make sure that is correct.
And, make sure your Bstagj2mpi really was compiled using
mpich2-1.0.7, and that your $LD_LIBRARY_PATH isn't picking up some
different libmpich.so.
-- Pete
More information about the mpiexec
mailing list