mpiexec & PBS Professional 7.1: "PBS reports fewer hosts than TM"

Ralf Eichmann Eichmann at altair.de
Mon Apr 3 04:15:46 EDT 2006


Pete,

thank you for your efforts to make mpiexec work with PBS Pro!

Are you in contact with our developers from AGT? I believe it makes
sense they provide you with a change log, so I sent a request to them. 

Best regards
Ralf

-- 
Dr. Ralf Eichmann                          Tel. +49-7031-6208-39
Technical Manager Enterprise Computing     Fax  +49-7031-6208-99
Manager Systems Administration                eichmann at altair.de
Altair Engineering GmbH            www.altair.de, www.pbspro.com 

> -----Original Message-----
> From: Pete Wyckoff [mailto:pw at osc.edu] 
> Sent: Friday, March 31, 2006 11:02 PM
> To: Thomas Zeiser
> Cc: mpiexec at osc.edu; Stefan Dieterich; Ralf Eichmann
> Subject: Re: mpiexec & PBS Professional 7.1: "PBS reports 
> fewer hosts than TM"
> 
> thomas.zeiser at rrze.uni-erlangen.de wrote on Fri, 31 Mar 2006 
> 21:29 +0200:
> > since upgrading from PBS Professional 7.0 to 7.1 we get the
> > following error message when starting jobs with mpiexec
> > 
> >   /opt/mpiexec-0.80/bin/mpiexec -n 2 -comm none hostname
> >   mpiexec: Error: get_hosts: PBS reports fewer hosts 1 than TM 2.
> > 
> > Recompiling mpiexec with the updated PBS libraries / includefiles
> > does not help. The machine is an SGI Altix (IA64) with SuSE
> > SLES9SP3/ProPack4. With both PBS Professional versions we use the
> > pbs_mom with cpusets
> 
> This is something we were just tracking down for someone else.
> They changed the meaning of the entries in nodelist[] returned
> by tm_nodeinfo().  No longer is it "nodes", but rather "CPUs".
> Rather annoying to switch it on us like this.
> 
> Can you try http://www.osc.edu/~pw/mpiexec/mpiexec-0.81-pre3.tgz ?
> It was tested on a PBSPro cluster environment, but not on an SMP
> like yours.  If it doesn't work, please walk through the while
> loop in get_hosts() (in get_hosts.c) and see if you can spot what
> is going on.  If you say it's fine maybe I'll just spin a release
> soon in case anyone else is testing.
> 
> "qstat -f" info may shed more light when looking at the mpiexec
> code.
> 
> There's another new PBSpro-only feature you may want to take a look
> at if you do not have standard IO redirection working, i.e "mpiexec
> --comm=none hostname > /dev/null" should produce no output.  Try to
> ./configure "--enable-pbspro-helper" sometime, but only after you
> get the above problem fixed.
> 
> 		-- Pete
> 
> 


More information about the mpiexec mailing list