[Lustre-discuss] Clients Unmounting Lustre
Don Thorp
dthorp at sdsc.edu
Tue Sep 1 11:34:37 PDT 2009
Our temporary filesystem was promoted by events in to semi-production,
as frequently happens, and is being overworked. We have too few
servers, too many targets and too many jobs. Typically, the clients
think the filesystem is unmounted. The servers record messages about
many clients being evicted due to lock blocking callback and lock
glimpse callback timeouts. Less frequently are bulk PUT timeouts.
The servers generally have less than 100MB free.
New hardware that will support the workload is on the way, but are
there some changes I can make now to 1.6.6 that would increase
reliability, even at the expense of performance?
-Don
More information about the lustre-discuss
mailing list