mpiexec & PBS Professional 7.1: "PBS reports fewer hosts than TM"

Pete Wyckoff pw at osc.edu
Fri Mar 31 16:01:46 EST 2006


thomas.zeiser at rrze.uni-erlangen.de wrote on Fri, 31 Mar 2006 21:29 +0200:
> since upgrading from PBS Professional 7.0 to 7.1 we get the
> following error message when starting jobs with mpiexec
> 
>   /opt/mpiexec-0.80/bin/mpiexec -n 2 -comm none hostname
>   mpiexec: Error: get_hosts: PBS reports fewer hosts 1 than TM 2.
> 
> Recompiling mpiexec with the updated PBS libraries / includefiles
> does not help. The machine is an SGI Altix (IA64) with SuSE
> SLES9SP3/ProPack4. With both PBS Professional versions we use the
> pbs_mom with cpusets

This is something we were just tracking down for someone else.
They changed the meaning of the entries in nodelist[] returned
by tm_nodeinfo().  No longer is it "nodes", but rather "CPUs".
Rather annoying to switch it on us like this.

Can you try http://www.osc.edu/~pw/mpiexec/mpiexec-0.81-pre3.tgz ?
It was tested on a PBSPro cluster environment, but not on an SMP
like yours.  If it doesn't work, please walk through the while
loop in get_hosts() (in get_hosts.c) and see if you can spot what
is going on.  If you say it's fine maybe I'll just spin a release
soon in case anyone else is testing.

"qstat -f" info may shed more light when looking at the mpiexec
code.

There's another new PBSpro-only feature you may want to take a look
at if you do not have standard IO redirection working, i.e "mpiexec
--comm=none hostname > /dev/null" should produce no output.  Try to
./configure "--enable-pbspro-helper" sometime, but only after you
get the above problem fixed.

		-- Pete


More information about the mpiexec mailing list