[Lustre-discuss] Clients Unmounting Lustre

Brian J. Murrell Brian.Murrell at Sun.COM
Tue Sep 1 12:13:09 PDT 2009


On Tue, 2009-09-01 at 11:34 -0700, Don Thorp wrote:
> 
> New hardware that will support the workload is on the way, but are  
> there some changes I can make now to 1.6.6 that would increase  
> reliability, even at the expense of performance?

With what you have given us to work with, my first suggestion would be
to increase your obd_timeout.  You should not need to go higher than
about 300 seconds, but should try to choose a value only high enough to
stop the callback timeouts.  Higher obd_timeout values mean longer
recoveries.

Additionally, you might look into tuning the number of OST threads on
your OSSes if you are driving your disks too hard.  OST thread count,
like obd_timeout should be just high enough, but not more, to reach
maximum throughput.  If you have not baselined your hardware with the
iokit, you can simply start dropping the OST thread counts until you
find that you are impacting throughput.  It's a bit more trial and error
than using the iokit, but if you are in production already, it's
probably the best you can do.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090901/ba01a54d/attachment.pgp>


More information about the lustre-discuss mailing list