Can someone help with this problem?

David Luet luet at Princeton.EDU
Fri Jul 16 17:38:09 EDT 2004


I had a similar problem. It was an MPI problem. MPI limits the size of 
the message that can be sent. I got around that by:
1) increasing the maximum message size allowed by MPI. You make this 
change in mpi_install_dir/mpid/ch_p4/p4/lib/p4_sr.h
You have to change the value of P4_MAX_MSGLEN. It is 1<<28 originally. I 
changed it to 1<<30 to be safe.
2) Recompiling MPI.
3) Increasing P4_GLOBMEMSIZE.
David

Brent M. Clements wrote:

>We are trying to run NAMD using mpiexec and torque.
>
>My system's analyst compiled the software and tried running it over our
>cluster using mpiexec but below is what he reported.
>
>Pete et al, please let me know if this is an mpiexec issue or an mpi
>issue. Then I'll go bother someone else...unless of course you guy's know
>what is the solution to the problem below.
>
>Thanks!
>
>Brent Clements
>Linux Technology Specialist
>Information Technology
>Rice University
>
>Linux at Rice news and information
>available only at http://linuxsupport.rice.edu
>
>
>---------- Forwarded message ----------
>Date: Thu, 08 Jul 2004 13:15:18 -0500
>From: Randy Crawford <rand at rice.edu>
>To: Brent M. Clements <bclem at rice.edu>
>Subject: Re: can you send me that error again?
>
>When running two processes over ethernet MPI, the original error was:
>
>"
>p2_15517: (38.889341) xx_shmalloc: returning NULL; requested 65584
>p2_15517: (38.889341) p4_shmalloc returning NULL; request = 65584 bytes
>You can increase the amount of memory by setting the environment variable
>P4_GLOBMEMSIZE (in bytes); the current size is 4194304
>p2_15517:  p4_error: alloc_p4_msg failed: 0
>CHARMDEBUG> Processor 3 has PID 15518
>CHARMDEBUG> Processor 1 has PID 13334
>bm_list_13335: (39.139197) net_send: could not write to fd=5, errno =32
>"
>
>You and Franco then reset shmmax on all the nodes to be much higher, and I think
>the failure then occurred at 128 KB.
>
>Then I set P4_GLOBMEMSIZE to something like 2 GB (instead of 4 MB), and I got a
>different error:
>
>p0_6444:  p4_error: exceeding max num of P4_MAX_SYSV_SHMIDS: 256
>
>     Randy
>
>_______________________________________________
>mpiexec mailing list
>mpiexec at osc.edu
>http://email.osc.edu/mailman/listinfo/mpiexec
>  
>






More information about the mpiexec mailing list