segfault
Glen Beane
beaneg at umcs.maine.edu
Fri Sep 5 10:20:06 EDT 2003
I have a rather strange problem: suddenly one node in my cluster
started crashing jobs, and the error was always that the mpi tasks on
that particular node had died with a signal 11(segfault). This problem
didn't happen with mpirun.ch_gm, only with mpiexec. The strange thing
is that all my nodes are diskless, so they all have the same exact
setup, and no other node has this problem. I've rebooted the node to
reset the ramdisk image, done memory tests, ran jobs with mpirun.ch_gm,
and no problems show up. It seems really strange to me that this one
node would be crashing jobs with mpiexec when all the other identical
nodes have no problem. This started about a week ago. I'm going to
upgrade mpiexec later today.
Does anyone have any ideas?
More information about the mpiexec
mailing list