one job only
Fokko Beekhof
fpbeekhof at gmail.com
Fri Apr 14 17:34:33 EDT 2006
Hi,
> What do you mean second job? Are these concurrent, i.e.
I mean that there is only one job running on the entire cluster.
No-one can successfully run a second job. Jobs can be submitted, jobs
will be scheduled and started, but only the first started job will
run. All others will fail with the message :
mpiexec: Error: poll_or_block_event: tm_poll remote 15010: System error.
It appears that one of the nodes is malfunctioning: node myri21 does
not automount NFS filesystems, including my home directory. Somehow
this node is allocated for each started job. Starting 2 extra jobs
simultaniously resulted in one dead job (node myri21 allocated) but
the other running just fine.
> I wonder if the first job terminated cleanly. Maybe you could run
> that one with "mpiexec -v -v " to see what happens. And the same
This didn't show much. I'll just try to find someone with a root
password and have the offending node shut down until the sysadmin
returns from vacation :-)
Thanks and best regards,
Fokko Beekhof
> P.S. Your gmail account sends html mail; might want to turn that
> off for mail to lists.
Sorry, it's my first post. It should be fixed in this one.
More information about the mpiexec
mailing list