mpich-shmem problems

Miroslav Ruda ruda at ics.muni.cz
Thu Aug 21 07:57:48 EDT 2003


On Wed, 2003-08-20 at 17:55, Pete Wyckoff wrote:
>     2.  Non-time-shared hosts seem to use "nodect".  Nodes file might
> ...
> You seem to fall in case (2), as do most cluster-type installations.
> Does your nodes file look the same, and do you show similar output for
> Resource_List variables?  There could be some PBSPro differences that
> I do not know about.

Output of mpiexec and Resource_List is attached. It looks like PBSPro sets both
ncpus and nodect and mpiexec is using ncpus first :-(

                   Mirek Ruda

skirit$ qsub -q parallel -l nodes=1:ppn=2:myrinet:brno,cput=1000:00:00 -I
...
skirit25$ qstat -f $PBS_JOBID | fgrep Resource_List
    Resource_List.cput = 1000:00:00
    Resource_List.ncpus = 2
    Resource_List.neednodes = 1:ppn=2:myrinet:brno#myrinet
    Resource_List.nodect = 1
    Resource_List.nodes = 1:ppn=2:myrinet:brno#myrinet
    Resource_List.walltime = 720:00:00
skirit25$ ~/shared/mpiexec.bad -verbose -comm=shmem tmp/cpi.shmem 
resolve_exe: using absolute exe "tmp/cpi.shmem"
node  0: name = skirit25.ics.muni.cz, mpname = skirit25.ics.muni.cz, cpu = 0
node  1: name = skirit25.ics.muni.cz, mpname = skirit25.ics.muni.cz, cpu = 1
wait_one_task_start: evt = 2, task 0 host skirit25.ics.muni.cz
Process 0 on skirit25.ics.muni.cz
Process 1 on skirit25.ics.muni.cz
pi is approximately 3.1416009869231241, Error is 0.0000083333333309
wall clock time = 0.000169
Process 1 on skirit25.ics.muni.cz
Process 0 on skirit25.ics.muni.cz
pi is approximately 3.1416009869231241, Error is 0.0000083333333309
wall clock time = 0.000172
wait_one_task_start: evt = 3, task 1 host skirit25.ics.muni.cz
All 2 tasks started.
wait_tasks: numspawned = 2, got evt 4 for tid 6 host skirit25.ics.muni.cz statu
 0
wait_tasks: numspawned = 1, got evt 5 for tid 7 host skirit25.ics.muni.cz statu
 0
skirit25$



More information about the mpiexec mailing list