[Lustre-discuss] Clients Unmounting Lustre
    Don Thorp 
    dthorp at sdsc.edu
       
    Tue Sep  1 11:34:37 PDT 2009
    
    
  
Our temporary filesystem was promoted by events in to semi-production,  
as frequently happens, and is being overworked.  We have too few  
servers, too many targets and too many jobs.  Typically, the clients  
think the filesystem is unmounted.  The servers record messages about  
many clients being evicted due to lock blocking callback and lock  
glimpse callback timeouts.  Less frequently are bulk PUT timeouts.   
The servers generally have less than 100MB free.
New hardware that will support the workload is on the way, but are  
there some changes I can make now to 1.6.6 that would increase  
reliability, even at the expense of performance?
-Don
  
    
    
More information about the lustre-discuss
mailing list