Checkpointing with mpiexec

Artem Polyakov artpol84 at gmail.com
Thu Jun 19 14:02:16 EDT 2008


Is there any developers documentation about mpiexec design besides source
codes? If it is can I get it?

2008/6/19 Pete Wyckoff <pw at osc.edu>:

> artpol84 at gmail.com wrote on Mon, 16 Jun 2008 14:28 +0700:
> > I try to use mpiexec with checkpointing program, which considers all
> sockets
> > and descriptors in the program. First problem I faced is that
> checkpointing
> > entire mpiexec have following problem:
> > When I restart from checkpointed image restoring program searches
> temporary
> > files created by PBS and fails when did not find them. Is it possible to
> > divide mpiexec into 2 parts:
> > 1. Gathering information about execution resources from PBS
> > 2. Starting the program using predetermined temporary files (not depended
> on
> > query ID and so on).
>
> But mpiexec doesn't keep open any files.  It does all its querying
> via sockets to the PBS server and to the local PBS mom.  So I think
> that we've got things more or less as you need them already.
>
> However the bigger problem is how to recreate these connections to
> the existing (non-restarted) PBS.  Mpiexec would likely need to be
> involved in the restart process, as it must find out the new TM task
> ids for the restarted tasks, and register obits for them.
>
> Curious how much of this you've thought through.  If you have ideas
> about what to do in mpiexec, please continue to say so.
>
>                -- Pete
>



-- 
С Уважением, Поляков Артем
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://email.osc.edu/pipermail/mpiexec/attachments/20080620/57933ad0/attachment.htm


More information about the mpiexec mailing list