pbspro/mpiexec

Pete Wyckoff pw at osc.edu
Thu Feb 27 09:19:46 EST 2003


stefan.friedel at iwr.uni-heidelberg.de said on Thu, 27 Feb 2003 13:14 +0100:
> we just changed to PBSpro and I got the following error message immediately after job is started:
> 
> mpiexec: Error: get_hosts: tm_init: tm: not connected.
> 
> I found the Stefan Parnell-Mails in the archive: on our nodes the moms could use pbs_iff/pbs_demux (actually I had connection problems with the version of pbspro so I linked the version of Openpbs...) and a Job started with mpirun works fine.
> Any hint?

Only guesses here since we don't have the non-free pbspro.  I see a
couple other people who are using pbspro in the archives, without the
problem you quote (12 Nov 2002 and 12 Dec 2002).  So it seems to be
possible.

Can you take a look at the mom log for the main node in the job, and see
if it has anything suspicious.  This is the very first call to the PBS
library where you see the complaint above.  In the openpbs source, that
error message looks like it happens only when there are major problems
with TM initialization.  Perhaps you linked mpiexec with -ltm from
openpbs, not pbspro, and there's a version mismatch?

		-- Pete



More information about the mpiexec mailing list