mpich-shmem problems

Pete Wyckoff pw at osc.edu
Thu Aug 21 11:02:05 EDT 2003


ruda at ics.muni.cz said on Thu, 21 Aug 2003 13:57 +0200:
> skirit25$ qstat -f $PBS_JOBID | fgrep Resource_List
>     Resource_List.cput = 1000:00:00
>     Resource_List.ncpus = 2
>     Resource_List.neednodes = 1:ppn=2:myrinet:brno#myrinet
>     Resource_List.nodect = 1
>     Resource_List.nodes = 1:ppn=2:myrinet:brno#myrinet
>     Resource_List.walltime = 720:00:00
> skirit25$ ~/shared/mpiexec.bad -verbose -comm=shmem tmp/cpi.shmem 
> resolve_exe: using absolute exe "tmp/cpi.shmem"
> node  0: name = skirit25.ics.muni.cz, mpname = skirit25.ics.muni.cz, cpu = 0
> node  1: name = skirit25.ics.muni.cz, mpname = skirit25.ics.muni.cz, cpu = 1

Okay, I see.  This patch calls cull_nodes() for the ncpus case as well
as for the nodect case.  I suspect that "cat $PBS_NODEFILE" will show
two lines in your batch job, although openpbs with ncpus (and no nodect)
just has one line.  Let me know if it works okay.

		-- Pete

diff -u -r1.32 get_hosts.c
--- get_hosts.c	20 Aug 2003 22:09:18 -0000	1.32
+++ get_hosts.c	21 Aug 2003 14:50:22 -0000
@@ -239,13 +239,15 @@
 	      __func__);
 	if (have_ncpus)
 	    tasks[0].num_copies = have_ncpus;  /* trust this one first */
+	    /* note pbspro will set both ncpus and nodect, thus cull below
+	     * is necessary to prune out extra nodes */
 	else {
 	    if (have_nodect != 1)
 		error("%s: pbs_statjob says nodect = %d,"
 		  " but shmem only handles nodect = 1", __func__, have_nodect);
 	    tasks[0].num_copies = numtask;  /* ppn value for the single node */
-	    cull_nodes(matching_node);  /* discard other cpu tasks */
 	}
+	cull_nodes(matching_node);  /* discard other cpu tasks */
 	pbs_statfree(bstat);
     }
 



More information about the mpiexec mailing list