[Lustre-discuss] Clients Unmounting Lustre

Don Thorp dthorp at sdsc.edu
Tue Sep 1 11:34:37 PDT 2009


Our temporary filesystem was promoted by events in to semi-production,  
as frequently happens, and is being overworked.  We have too few  
servers, too many targets and too many jobs.  Typically, the clients  
think the filesystem is unmounted.  The servers record messages about  
many clients being evicted due to lock blocking callback and lock  
glimpse callback timeouts.  Less frequently are bulk PUT timeouts.   
The servers generally have less than 100MB free.

New hardware that will support the workload is on the way, but are  
there some changes I can make now to 1.6.6 that would increase  
reliability, even at the expense of performance?

-Don

  



More information about the lustre-discuss mailing list