mpiexec & PBS Professional 7.1: "PBS reports fewer hosts than
TM"
Ralf Eichmann
Eichmann at altair.de
Mon Apr 3 04:15:46 EDT 2006
Pete,
thank you for your efforts to make mpiexec work with PBS Pro!
Are you in contact with our developers from AGT? I believe it makes
sense they provide you with a change log, so I sent a request to them.
Best regards
Ralf
--
Dr. Ralf Eichmann Tel. +49-7031-6208-39
Technical Manager Enterprise Computing Fax +49-7031-6208-99
Manager Systems Administration eichmann at altair.de
Altair Engineering GmbH www.altair.de, www.pbspro.com
> -----Original Message-----
> From: Pete Wyckoff [mailto:pw at osc.edu]
> Sent: Friday, March 31, 2006 11:02 PM
> To: Thomas Zeiser
> Cc: mpiexec at osc.edu; Stefan Dieterich; Ralf Eichmann
> Subject: Re: mpiexec & PBS Professional 7.1: "PBS reports
> fewer hosts than TM"
>
> thomas.zeiser at rrze.uni-erlangen.de wrote on Fri, 31 Mar 2006
> 21:29 +0200:
> > since upgrading from PBS Professional 7.0 to 7.1 we get the
> > following error message when starting jobs with mpiexec
> >
> > /opt/mpiexec-0.80/bin/mpiexec -n 2 -comm none hostname
> > mpiexec: Error: get_hosts: PBS reports fewer hosts 1 than TM 2.
> >
> > Recompiling mpiexec with the updated PBS libraries / includefiles
> > does not help. The machine is an SGI Altix (IA64) with SuSE
> > SLES9SP3/ProPack4. With both PBS Professional versions we use the
> > pbs_mom with cpusets
>
> This is something we were just tracking down for someone else.
> They changed the meaning of the entries in nodelist[] returned
> by tm_nodeinfo(). No longer is it "nodes", but rather "CPUs".
> Rather annoying to switch it on us like this.
>
> Can you try http://www.osc.edu/~pw/mpiexec/mpiexec-0.81-pre3.tgz ?
> It was tested on a PBSPro cluster environment, but not on an SMP
> like yours. If it doesn't work, please walk through the while
> loop in get_hosts() (in get_hosts.c) and see if you can spot what
> is going on. If you say it's fine maybe I'll just spin a release
> soon in case anyone else is testing.
>
> "qstat -f" info may shed more light when looking at the mpiexec
> code.
>
> There's another new PBSpro-only feature you may want to take a look
> at if you do not have standard IO redirection working, i.e "mpiexec
> --comm=none hostname > /dev/null" should produce no output. Try to
> ./configure "--enable-pbspro-helper" sometime, but only after you
> get the above problem fixed.
>
> -- Pete
>
>
More information about the mpiexec
mailing list