[Lustre-discuss] NFS Performance

Tue Apr 15 13:45:38 PDT 2008

> Thanks Mark!  I just started using collectl last week.  I'll investigate the options you suggested in a minutes and see.
>   
by all means do so and if you have any problems with the switches - you 
don't want to know how much extra code there is in collectl just to deal 
with all the lustre stats - just let me know.  Also be sure to let me 
know if you encounter any operational problems too.  I'd only recently 
gotten around to adding support for 1.6.4 and while I think it all works 
the proof is in trying it in a lot of different configurations/environments.
-mark
> Dan
>
>
> -----Original Message-----
> From: Mark Seger <Mark.Seger at hp.com>
> Date: Tuesday, Apr 15, 2008 12:39 pm
> Subject: Re: [Lustre-discuss] NFS Performance
> To: Dan <dan at nerp.net>
> CC: Lustre-discuss at lists.lustre.org
>
> while I can't tell you how to tune nfs, I can tell you how to monitor it.  With collectl - http://collectl.sourceforge.net/ - you should be able to watch nfs, lustre and your network all at once, maybe even toss in cpu for good measure
>
> This is an example of the output (along with the appropriate switches).  I'm not doing anything over nfs, so those fields are all zero.
>
> [root at cag-dl145-172 ~]# collectl -scnfl
> waiting for 1 second sample...
> #<--------CPU--------><-----------Network----------><--NFS Svr Summary--><-------Lustre Client->
> #cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   read  write  calls  Reads KBRead Writes Ke
>    0   0 11335     33   2301  33665    2301   33665      0      0      0      0      0      0  0
>    0   0 11377     59   2303  33693    2303   33690      0      0      0      0      0      0  0
>    0   0 11362     29   2305  33719    2305   33721      0      0      0      0      0      0  0
>
> there are lots of different options you can try, but again I'm not sure what to look for.  changing the 'f' to 'F' lets you did a little deeper and looks at the metadata ops, commits, and restrans.
> [root at cag-dl145-172 ~]# collectl -scnFl
> #<--------CPU--------><-----------Network----------><----NFS MetaOps----><-------Lustre Client->
> #cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   meta commit retran  Reads KBRead Writes Ke
>    0   0   121     43      0      4       0       2      0      0      0      0      0      0  0
>    0   0   146    143      0      2       0       3      0      0      0      0      0      0  0
>
> if you really want to see everything nfsstat might show there's two more formats based on the case of the 'f':
> [root at cag-dl145-172 ~]# collectl -sf --verbose
> # NFS SERVER (/sec)
> #<----------Network-------><----------RPC---------><---NFS V3--->
> #PKTS   UDP   TCP  TCPCONN  CALLS  BADAUTH  BADCLNT   READ  WRITE
>     0     0     0        0      0        0        0      0      0
>
> any my favorite when I haven't a clue what nfs is doing:
> [root at cag-dl145-172 ~]# collectl -sF --verbose
> # NFS V3 SERVER (/sec)
> #NULL GETA SETA LOOK ACCS RLNK READ WRIT CRE8 MKDR SYML MKND RMOV RMDR RENM LINK RDIR RDR+ FSTA FINF PATH COMM
>     0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
>
> on the other hand if you want to see the size of the rpcs bucket sizes being received from lustre there's always:
> [root at cag-dl145-172 collectl]# ./collectl.pl -s l -OB
> # LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
> #Rds  RdK   1P   2P   4P   8P  16P  32P  64P 128P 256P Wrts WrtK   1P   2P   4P   8P  16P  32P  64P 128P 256P
>    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
>
> I haven't had too much feedback on collectl and am always looking for some. btw - there are a lot more options than I just showed you and if you like timestamps, just append -oT to the commands.
>
> that should give you a pretty good start...  8-)
>
> -mark
>
> Dan wrote:
>  Hi,
>
>   
>> With help from Oleg we got the right patches applied and NFS working 
>>     
>  well.  Maximum performance was about 60 MB/sec.  Last week that 
>  dropped to about 12.5 MB/sec and I cannot find a reason.  Lustre 
>  clients all obtain 100+ MB/sec on GigE.  Each OST is good for 270 
>  MB/sec.  When mounting the client on one of the OSSs I get 230 
>  MB/sec.  Seems the speed is there.  How can NFS and Lustre be tuned 
>  better?
>
>   
>> Current config for 1.6.4.3 is below:
>>     
>
>   
>> 1.  MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64
>>     
>  2.  OSS w/ 6 OSTs - ost_max_num_threads=64
>  3.  20 Lustre clients - all perform well (GREAT Lustre developers!!!! 
>  this system is amazing!)
>  4.  NFS server runs from a Lustre client machine for 12 to 15 MB/sec max.
>  5.  NFS server from the MGS (client on MGS/OSS = bad, I know!) can get 
>  20 to 30 MB/sec
>      - this got 60+ MB/sec in the past.
>
>   
>> bugs and patches applied:
>>     
>
>   
>> 14360 - 14006 is the only patch
>>     
>  14379 - patch 14007 only since 14008 is reversed by 14591
>  13371 - bug for the above mentioned 14591 patch
>
>   
>> With these patches the system is stable unless I bump the OST or MGS 
>>     
>  threads too high.  Performance doesn't seem to change much with any 
>  tuning.  I've adjusted the client via /proc and the OSTs and MGS via 
>  /etc/modprobe.conf.
>
>   
>> Suggestions?
>>     
>
>   
>> Thank you,
>>     
>
>   
>> Dan
>>     
>  ------------------------------------------------------------------------
>
>   
>> _______________________________________________
>>     
>  Lustre-discuss mailing list
>  Lustre-discuss at lists.lustre.org
>  http://lists.lustre.org/mailman/listinfo/lustre-discuss
>    
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>