[Lustre-discuss] tcp network load balancing understanding lustre 1.8

Brian J. Murrell Brian.Murrell at Sun.COM
Sun May 10 08:00:23 PDT 2009


On Sun, 2009-05-10 at 15:07 +0100, Christopher J. Walker wrote:
> 
> I've had a brief skim of the source code, and I think that the problem 
> is that iozone uses UDP packets to communicate. On a heavily loaded 
> network, one of these is bound to get lost. Presumably iozone doesn't 
> have the right retry strategy.

Why not use a benchmark that uses an established MPI (such as MPICH or
LAM, which can run it's message passing infrastructure on a TCP
transport such as rsh or ssh) library.  IOR is one such benchmark.

Of course, if your network is really so loaded as to be dropping UDP
packets then that will probably impact the latency of the MPI messages.
Not sure if that will have a meaningful impact on IOR or not.  I tend to
think the messaging is quite low volume so perhaps not.

In any case, it can add another data point to your debugging efforts to
help prove or disprove your hypothesis.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090510/07ea42fe/attachment.pgp>


More information about the lustre-discuss mailing list