[Lustre-discuss] lustre performance question (contd)

Mark Seger Mark.Seger at hp.com
Tue Dec 25 08:21:28 PST 2007


Once again I'll put in a recommendation for collectl.  You can certainly 
monitor disk activity on the oss, but let's not forget what is being 
delivered to the client.  It's possible that you could be clogging up 
the network and unless you look at the client side you'll never know.  
With collectl I like to monitor both network AND lustre traffic on the 
client side because there are cases where the network rate greatly 
exceeds that of lustre rate and unless you look at both on the client 
side you'll never know.  In fact, even if the disk traffic on the OSS is 
low that can be a false indicator if the network is backed up or the 
client simply isn't reading that fast.  What if the client is sending 
small RPC requests?  Again, this is something you can monitor with 
collectl but not conventional tools.  enough rambling...
-mark

Aaron Knister wrote:
> When the volume is slugish do you see lots of tiny reads (500k/sec) on  
> the full volume? When it slows down could you run an "iostat -k 2" on  
> the OSS in question? I think we may be having the same problem. I  
> could find no answer/solution and ended up blowing the whole setup  
> away and starting from scratch. I'd like to track this down and figure  
> out if its actually a bug or whether I FUBAR'd something in my setup.
>
> -Aaron
>
> On Dec 25, 2007, at 9:42 AM, Balagopal Pillai wrote:
>
>   
>> Hi,
>>
>>     Please ignore the previous email. It seemed to solve itself  
>> after 10
>> - 15 minutes of mounting the filled volume. Now it is as fast as the  
>> empty
>> volumes.
>>
>> Thanks
>> Balagopal
>>
>> ---------- Forwarded message ----------
>> Date: Tue, 25 Dec 2007 10:36:28 -0400 (AST)
>> From: Balagopal Pillai <pillai at mathstat.dal.ca>
>> To:  <lustre-discuss at clusterfs.com>
>> Subject: lustre performance question
>>
>> Hi,
>>
>>         We have one Lustre volume that is getting full and some other
>> volumes that are totally empty. The one that is full is a little  
>> sluggish
>> at times with the following messages appearing in syslog on the OSS -
>>
>> Lustre: 5809:0:(filter_io_26.c:698:filter_commitrw_write()) data1- 
>> OST0001:
>> slow i_mutex 82s
>> Lustre: 5809:0:(filter_io_26.c:711:filter_commitrw_write()) data1- 
>> OST0001:
>> slow brw_start 82s
>> Lustre: 5809:0:(filter_io_26.c:763:filter_commitrw_write()) data1- 
>> OST0001:
>> slow direct_io 82s
>> Lustre: 5809:0:(filter_io_26.c:776:filter_commitrw_write()) data1- 
>> OST0001:
>> slow commitrw commit 82s
>>
>>           But the same two OSS are also exporting the empty volume,  
>> which
>> is very fast on any tests (like creation of a tar file, bonnie etc  
>> etc)
>> I also tested the same thing on the nfs exported backup volume of the
>> filled up lustre volume (exported from the same OSS server) and it  
>> doesn't
>> show any significant slow down. Is it normal for Lustre volumes to  
>> slow
>> down when the volumes get full?
>>
>> Thanks
>> Balagopal
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at clusterfs.com
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>     
>
> Aaron Knister
> Associate Systems Analyst
> Center for Ocean-Land-Atmosphere Studies
>
> (301) 595-7000
> aaron at iges.org
>
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list