MIPCH-p4 programs on myrinet
Brent Clements
bclem at rice.edu
Mon Oct 20 08:15:25 EDT 2003
You have to reconfigure mpich with the option --with-comm=shared
Otherwise it won't work. The configure script is wrong because by
default shared comm. is actually not enabled. We debugged and found that
you have to force shared comm.
This is not an mpiexec problem btw.
-Brent
On Mon, 2003-10-20 at 05:44, Andrew Emerson wrote:
> Hi
>
> We are trying to run a MPICH-P4 executable on our myrinet cluster (2
> procs/node). Without the -comm option we get just non-communicating serial
> jobs but when using -comm mpich-p4 we get the following error:
>
> node 0: name = node05, cpu = 1
> node 1: name = node05, cpu = 0
> wait_one_task_start: evt = 2, task 0 host node05
> All 1 task started.
> bm_slave_1_16011: (0.002536) process not in process table; my_unix_id =
> 16011 my_host=node05
> bm_slave_1_16011: (0.002733) Probable cause: local slave on uniprocessor
> without shared memory
> bm_slave_1_16011: (0.002754) Probable fix: ensure only one process on node05
> bm_slave_1_16011: (0.002766) (on master process this means 'local 0' in the
> procgroup file)
> bm_slave_1_16011: (0.002776) You can also remake p4 with SYSV_IPC set in
> the OPTIONS file
> bm_slave_1_16011: (0.002785) Alternate cause: Using localhost as a machine
> name in the progroup
> bm_slave_1_16011: (0.002795) file. The names used should match the
> external network names.
> bm_slave_1_16011: p4_error: p4_get_my_id_from_proc: 0
> p0_15936: (0.002576) send_message: to=1; invalid conn type=5
> p0_15936: (0.002576) send_message: to=1; invalid conn type=5
> wait_tasks: numspawned = 1, got evt 3 for tid 2 host node05 status 1
> mpiexec: Warning: main: task 0 exited with status 1 (raw 0x1).
>
> THe version of mpiexec is 0.72 (mpiexec -version = Version 0.72, configure
> options: --with-pbs=/usr/local/pbse/OpenPBS_2_3_12-mpiexec072
> --with-pbssrc=/cineca/prod/OpenPBSe/OpenPBS_2_3_12-mpiexec072
> --with-smp-size=2 --with-mpich-gm --with-myri-cards=2).
>
> Is there any way round this? Unfortunately, we dont have the source code of
> the application only the executable which has been statically linked.
>
> best wishes
> Andy Emerson
>
> ------------------------
> Dr Andrew Emerson
> CINECA (High Performance Systems)
> via Magnanelli, 6/3
> 40033 Casalecchio di Reno (BO)-ITALY
> tel: +39-051-6171653, fax: +39-051-6132198
> e-mail: a.emerson at cineca.it
>
> _______________________________________________
> mpiexec mailing list
> mpiexec at osc.edu
> http://email.osc.edu/mailman/listinfo/mpiexec
>
More information about the mpiexec
mailing list