[Lustre-discuss] o2ib possible network problems -- solved
Brian Behlendorf
behlendorf1 at llnl.gov
Mon Sep 22 16:34:45 PDT 2008
> FWIW, 1000 is waaaaay high. Our biggest production systems (thousands
> if not 10s of thousands) nodes don't use values higher than 300 seconds.
Since I'm here at LLNL and we happen to have a few of the large systems maybe
I should chime in. While it is true our large systems (many thousands of
nodes) use a timeout value of 300s, it is not true that they prevent all of
our timeouts. The 300s value has just shown itself through actual usage to
prevent 99% of our timeouts and still allow reasonable length recovery times.
It certainly does not prevent all of our timeouts. To get to that point I
feel the only viable solution is to validate the new adaptive timeout feature
for our production use.
--
Thanks,
Brian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080922/b137a69a/attachment.pgp>
More information about the lustre-discuss
mailing list