TR: mpiexec questions ...
Pete Wyckoff
pw at osc.edu
Tue May 14 22:22:38 EDT 2002
guillaume.alleon at airbus.aeromatra.com said:
> I have dowloaded the latest version of mpiexec in order to
> use it on a new PC cluster we have. It is running Linux with
> Myrinet on dual CPU nodes.
> It compiles OK but I got troubles when trying to use the hello
> example.
> It looks like the myrinet id are not correct.
> Here is my error :
> ERROR (mapping issue): mpi node 0 gm_id 2 (node 1)doesn't
> know host node 1 (mpid 0)
> Process aborting
> ....
>
>
> On my machine I am using a ~.gmpi/conf file that look like
> this:
> 16
> node0.clustal.com 2
> node0.clustal.com 4
> ...
>
> Where can I specify the gm_id 2 & 4 in order to mimick
> the conf file ? ... or may be the problem is elsewhere ?
That error is produced by the ch_gm layer in mpich, which complains
that one machine doesn't know the gm routing information to talk to
another node. You might want to read the documentation on how to
run the mapper, then do some tests like gm_allsize to ensure that the
basics of gm are okay.
Mpiexec builds its own equivalent of a ~/.gmpi/conf file which
will be left around as, e.g., ~/.gmpiconf.30302, if you invoke
mpiexec with the --verbose argument. As long as PBS knows the proper
names of the nodes in your cluster, and the map produced by GM
uses the same hostnames, this file should be fine. Inspect it to
make sure.
Perhaps PBS thinks the nodenames are _not_ fully qualified, but
GM does, e.g. "node0" vs "node0.clustal.com"? You must cause these
things to agree. What we do is use only the short names for the
internal cluster nodes.
Hopefully you built mpiexec using something like:
./configure --with-smp-size=2
so that it knows you are using SMP nodes. But failing to do
this won't cause your errors.
-- Pete
More information about the mpiexec
mailing list