MPIEXEC Problems
James O'Dell
James_ODell at Brown.edu
Mon Jul 7 15:41:57 EDT 2003
Perhaps someone can help me.
I usuing MPICH-1.2.5 compiled without shared memory.
I built MPIEXEC using:
./configure -with-pbs=/opt/OpenPBS/ --prefix=/opt/mpiexec
--disable-p4-shmem
everthing builds correctly.
I have a small test file test.sh:
#!/bin/sh
#PBS -l nodes=4:ppn=2
#PBS -l walltime=5:00
#PBS -l cput=40:00
#PBS -j oe
#PBS -o test.out
sed -e "s/$/.gig.net/" $PBS_NODEFILE > nodes
PBS_NODEFILE=nodes
mpiexec --nostdin -nostdout --verbose --comm=p4 hello
When I run it test.out contains:
resolve_exe: found exe "hello" in path
node 0: name = compute-0-24, mpname = compute-0-24, cpu = 1
node 1: name = compute-0-24, mpname = compute-0-24, cpu = 0
node 2: name = compute-0-25, mpname = compute-0-25, cpu = 1
node 3: name = compute-0-25, mpname = compute-0-25, cpu = 0
node 4: name = compute-1-14, mpname = compute-1-14, cpu = 1
node 5: name = compute-1-14, mpname = compute-1-14, cpu = 0
node 6: name = compute-1-15, mpname = compute-1-15, cpu = 1
node 7: name = compute-1-15, mpname = compute-1-15, cpu = 0
mpiexec: Error: wait_one_task_start: tm_poll remote: System error.
wait_one_task_start: evt = 2, task 0 host compute-0-24
read_p4_master_port: waiting for port from master
read_p4_master_port: got port 50988
The mom_logs directory on the lead node contains the following:
07/07/2003 15:32:58;0008;
pbs_mom;Job;2015.lou.cascv.brown.edu;Started, pid = 8297
07/07/2003 15:32:58;0100; pbs_mom;Req;;Type 19 request received from
PBS_Server at frontend-1-16, sock=10
07/07/2003 15:32:58;0008; pbs_mom;Job;2015.lou.cascv.brown.edu;task
started, /bin/sh
07/07/2003 15:32:58;0080; pbs_mom;Job;2015.lou.cascv.brown.edu;task 1
terminated
07/07/2003 15:32:58;0008;
pbs_mom;Job;2015.lou.cascv.brown.edu;Terminated
07/07/2003 15:33:04;0001; pbs_mom;Svr;pbs_mom;task_check, cannot
tm_reply to 2015.lou.cascv.brown.edu task 1
07/07/2003 15:33:04;0001; pbs_mom;Svr;pbs_mom;task_check, cannot
tm_reply to 2015.lou.cascv.brown.edu task 1
07/07/2003 15:33:04;0001; pbs_mom;Svr;pbs_mom;task_check, cannot
tm_reply to 2015.lou.cascv.brown.edu task 1
07/07/2003 15:33:04;0001; pbs_mom;Svr;pbs_mom;task_check, cannot
tm_reply to 2015.lou.cascv.brown.edu task 1
07/07/2003 15:33:08;0080; pbs_mom;Job;2015.lou.cascv.brown.edu;task 2
terminated
07/07/2003 15:33:10;0001; pbs_mom;Svr;pbs_mom;task_check, cannot
tm_reply to 2015.lou.cascv.brown.edu task 1
07/07/2003 15:33:10;0001; pbs_mom;Svr;pbs_mom;task_check, cannot
tm_reply to 2015.lou.cascv.brown.edu task 1
07/07/2003 15:33:10;0008;
pbs_mom;Job;2015.lou.cascv.brown.edu;kill_job
07/07/2003 15:33:10;0080; pbs_mom;Job;2015.lou.cascv.brown.edu;Obit
sent
07/07/2003 15:33:10;0100; pbs_mom;Req;;Type 54 request received from
PBS_Server at frontend-1-16, sock=11
07/07/2003 15:33:10;0100; pbs_mom;Req;;Type 6 request received from
PBS_Server at frontend-1-16, sock=11
Anybody know what I'm doing wrong?
Thanks,
Jim
More information about the mpiexec
mailing list