mpiexec SGE integration
Rayson Ho
raysonho at eseenet.com
Fri Feb 11 23:54:45 EST 2005
Hi,
I think this is what needs to be done:
1) SGE TM lib:
tm_spawn() - start a task on a remote host (connect to sge_execd,
set env.var., fork/exec)
tm_kill() - kill a task
tm_poll() - notify mpiexec about task start/exit
tm_obit() - subscribe to get exit value for a task
tm_nodeinfo() - data structure lookup for node info
tm_init() - init TM lib
tm_finalize() - cleanup
Since TM is event based, we need to add state in the TM lib.
2) stdio forward
When SGE executes the task on the slave nodes, it needs to look at the
following environment variables:
MPIEXEC_STDIN_PORT
MPIEXEC_STDOUT_PORT
MPIEXEC_STDERR_PORT
And open a connection back to the host that requested tm_spawn, and connect
the first 3 file descriptors accordingly for the task.
3) Non-TM routines used by mpiexec
pbs_connect()
pbs_disconnect()
pbs_statfree()
pbs_statjob()
I think only pbs_statjob() is a must have, pbs_connect, pbs_disconnect, and
pbs_statfree are just setup calls/cleanup calls.
For our SGE implementation of pbs_statjob(), we will return the same PBS
struct, but "ncpus"/"nodect" will be parsed the information from the
hostfile.
pbs_errno()
pbse_to_txt()
For these 2, I just need to wrap around similar SGE functions.
Pete, can you check if I missed anything?? Also, is there anything in the
TM API that you think should be improved??
Thanks,
Rayson
---------------------------------------------------------
Get your FREE E-mail account at http://www.eseenet.com !
More information about the mpiexec
mailing list