Trouble with all jobs launched on same node.

james.becnel at srs.gov james.becnel at srs.gov
Thu Apr 8 10:58:09 EDT 2004


Problem: For example, on a 4 node run, although OpenPBS allocates 4 
separate nodes, they all launch on the first node given in the PBS node 
list.

Any ideas why?  I have been exploring the code to try to find a problem, 
but it looks to be where mpiexec gets the information from OpenPBS.  Let 
me know if you have any thoughts. Maybe I need to be looking at the code 
on the OpenPBS side instead?  Can you suggest any workarounds?  I have no 
problems coding in C. Thank you!



OpenPBS qstat output:
---------------------
                                                            Req'd  Req'd 
Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S 
Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - 
-----
12938.c00       msi3     ms       MS_375NN    12541   4   4    --    --  R 
  -- 
   c07/0+c02/0+c16/0+c11/0


mpiexec-0.75 task[] settings:
-----------------------------
node  0: name = c07, mpname = c07, cpu = 0
node  1: name = c07, mpname = c07, cpu = 0
node  2: name = c07, mpname = c07, cpu = 0
node  3: name = c07, mpname = c07, cpu = 0


mpiexec-0.75 compile options: --with-comm=shared 
--with-default-comm=mpich-p4


Setup:
------
mpiexec-0.75
OpenPBS-2.3.16 (no patches)
MPICH-1.2.5.2
Linux-2.4.18 / RedHat 7.3



--------------------------------
Jim Becnel
Savannah River Technology Center
Bldg. 773-42A, Room 179
Aiken, SC  29808
voice: 803-725-7386 
fax: 803-725-8829
email: James.Becnel at srs.gov
--------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://email.osc.edu/pipermail/mpiexec/attachments/20040408/3601baa1/attachment.htm


More information about the mpiexec mailing list