mpich-shmem problems
Pete Wyckoff
pw at osc.edu
Thu Aug 21 11:02:05 EDT 2003
ruda at ics.muni.cz said on Thu, 21 Aug 2003 13:57 +0200:
> skirit25$ qstat -f $PBS_JOBID | fgrep Resource_List
> Resource_List.cput = 1000:00:00
> Resource_List.ncpus = 2
> Resource_List.neednodes = 1:ppn=2:myrinet:brno#myrinet
> Resource_List.nodect = 1
> Resource_List.nodes = 1:ppn=2:myrinet:brno#myrinet
> Resource_List.walltime = 720:00:00
> skirit25$ ~/shared/mpiexec.bad -verbose -comm=shmem tmp/cpi.shmem
> resolve_exe: using absolute exe "tmp/cpi.shmem"
> node 0: name = skirit25.ics.muni.cz, mpname = skirit25.ics.muni.cz, cpu = 0
> node 1: name = skirit25.ics.muni.cz, mpname = skirit25.ics.muni.cz, cpu = 1
Okay, I see. This patch calls cull_nodes() for the ncpus case as well
as for the nodect case. I suspect that "cat $PBS_NODEFILE" will show
two lines in your batch job, although openpbs with ncpus (and no nodect)
just has one line. Let me know if it works okay.
-- Pete
diff -u -r1.32 get_hosts.c
--- get_hosts.c 20 Aug 2003 22:09:18 -0000 1.32
+++ get_hosts.c 21 Aug 2003 14:50:22 -0000
@@ -239,13 +239,15 @@
__func__);
if (have_ncpus)
tasks[0].num_copies = have_ncpus; /* trust this one first */
+ /* note pbspro will set both ncpus and nodect, thus cull below
+ * is necessary to prune out extra nodes */
else {
if (have_nodect != 1)
error("%s: pbs_statjob says nodect = %d,"
" but shmem only handles nodect = 1", __func__, have_nodect);
tasks[0].num_copies = numtask; /* ppn value for the single node */
- cull_nodes(matching_node); /* discard other cpu tasks */
}
+ cull_nodes(matching_node); /* discard other cpu tasks */
pbs_statfree(bstat);
}
More information about the mpiexec
mailing list