Problem with pbs node configuration

Pete Wyckoff pw at osc.edu
Mon Aug 4 10:46:07 EDT 2008


gpadoan at inogs.it wrote on Mon, 04 Aug 2008 15:04 +0200:
> I am testing mpiexec 0.83 on my little cluster (10 nodes with Intel Quad-Core plus an front-end Intel Dual-Core).
>
> On it I installed:
>
> Torque-2.4.0-snap.200804241119
> Maui-3.2.6p20-snap.1182974819
> Mpich2-1.0.7
>
> and, now, they work fine.
>
>
>
> I have compiled mpiexec with:
>
> ./configure --prefix=/opt/mpich/mpiexec-0.83  --with-pbs=/opt/torque/2.4.0/ --disable-mpich-gm  --with-default-comm=mpich2 \
> --disable-mpich-ib --disable-mpich-rai  --disable-lam  --disable-shmem --disable-emp  --disable-portals   \
> --with-mpicc=/opt/mpich/2-1.0.7/bin/mpicc --with-mpif77=/opt/mpich/2-1.0.7/bin/mpif77  --with-sed=/bin/sed
>
>
> If I try for mpiexec (Mpiexec-0.83) I get the following.
>
>
> When I run the job:
>
> #PBS -l nodes=2:ppn=2
> qsub -q batch   ./go_mpiexec
>
> where script "go_mpiexec" is:
>
> /opt/mpich/mpiexec-0.83/bin/mpiexec -verbose --comm=mpich2  /home/giorgio/job/test/wave_mpi/Bstagj2mpi
>
>
> Ouput file of Mpich2 logs:
>
> node  0: name <my-name>, cpu avail 1
>  ATT: must use at least 2 CPUs

This is the code apparently complaining that MPI_Comm_size()
returned 1.  Mpiexec should have started 4 according to your PBS
submission.

First thing to check is to look at the mpiexec logs.  That
"-verbose" should have said something.  You can add one more "-v" to
get even more information.  We need to make sure it knows there are
four available processors.

Also, do "qstat -F $PBS_JOBID" inside the go_mpiexec script to
make sure that is correct.

And, make sure your Bstagj2mpi really was compiled using
mpich2-1.0.7, and that your $LD_LIBRARY_PATH isn't picking up some
different libmpich.so.

		-- Pete


More information about the mpiexec mailing list