one job only

Fokko Beekhof fpbeekhof at gmail.com
Fri Apr 14 12:13:45 EDT 2006


Hello,

We seem to have a problem with mpiexec 0.80 + torque 1.2.0p1 +
mpich-1.2.6..14b-gcc-4.1.0
It is possible to start a job, but when trying to launch a second job, it
will terminate immediately, and the stderr file contains:

mpiexec: Error: poll_or_block_event: tm_poll remote 15010: System error.

Strace says:

31782 bind(5, {sa_family=AF_INET, sin_port=htons(1023), sin_addr=inet_addr("
0.0.0.0")}, 16) = -1 EACC
ES (Permission denied)
31782 close(5)                          = 0
31782 write(2, "pbs_iff: cannot connect to myri00:15001 - fatal error,
errno=13 (Permission denied)\n
", 84) = 84
31782 exit_group(4)                     = ?
31781 <... read resumed> "", 4)         = 0
31781 --- SIGCHLD (Child exited) @ 0 (0) ---
31781 close(5)                          = 0
31781 waitpid(31782, [{WIFEXITED(s) && WEXITSTATUS(s) == 4}], 0) = 31782
31781 close(4)                          = 0
31781 write(2, "mpiexec: Error: ", 16)  = 16
31781 write(2, "get_hosts: pbs_connect", 22) = 22
31781 write(2, ": Unauthorized Request .\n", 25) = 25

However, this problem does not occur with the first job. Also,
/usr/sbin/pbs_iff -t myri0 15001
doesn't generate any output that indicates error.

I am lost.  Any suggestion ?

Best regards,

Fokko Beekhof
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://email.osc.edu/pipermail/mpiexec/attachments/20060414/a1a986eb/attachment.htm


More information about the mpiexec mailing list