[Lustre-discuss] NFS Performance

Mark Seger Mark.Seger at hp.com
Tue Apr 15 12:39:20 PDT 2008


while I can't tell you how to tune nfs, I can tell you how to monitor 
it.  With collectl - http://collectl.sourceforge.net/ - you should be 
able to watch nfs, lustre and your network all at once, maybe even toss 
in cpu for good measure

This is an example of the output (along with the appropriate switches).  
I'm not doing anything over nfs, so those fields are all zero.

[root at cag-dl145-172 ~]# collectl -scnfl
waiting for 1 second sample...
#<--------CPU--------><-----------Network----------><--NFS Svr 
Summary--><-------Lustre Client->
#cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   read  write  
calls  Reads KBRead Writes Ke
   0   0 11335     33   2301  33665    2301   33665      0      0      
0      0      0      0  0
   0   0 11377     59   2303  33693    2303   33690      0      0      
0      0      0      0  0
   0   0 11362     29   2305  33719    2305   33721      0      0      
0      0      0      0  0

there are lots of different options you can try, but again I'm not sure 
what to look for.  changing the 'f' to 'F' lets you did a little deeper 
and looks at the metadata ops, commits, and restrans.
[root at cag-dl145-172 ~]# collectl -scnFl
#<--------CPU--------><-----------Network----------><----NFS 
MetaOps----><-------Lustre Client->
#cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   meta commit 
retran  Reads KBRead Writes Ke
   0   0   121     43      0      4       0       2      0      0      
0      0      0      0  0
   0   0   146    143      0      2       0       3      0      0      
0      0      0      0  0

if you really want to see everything nfsstat might show there's two more 
formats based on the case of the 'f':
[root at cag-dl145-172 ~]# collectl -sf --verbose
# NFS SERVER (/sec)
#<----------Network-------><----------RPC---------><---NFS V3--->
#PKTS   UDP   TCP  TCPCONN  CALLS  BADAUTH  BADCLNT   READ  WRITE
    0     0     0        0      0        0        0      0      0

any my favorite when I haven't a clue what nfs is doing:
[root at cag-dl145-172 ~]# collectl -sF --verbose
# NFS V3 SERVER (/sec)
#NULL GETA SETA LOOK ACCS RLNK READ WRIT CRE8 MKDR SYML MKND RMOV RMDR 
RENM LINK RDIR RDR+ FSTA FINF PATH COMM
    0    0    0    0    0    0    0    0    0    0    0    0    0    
0    0    0    0    0    0    0    0    0

on the other hand if you want to see the size of the rpcs bucket sizes 
being received from lustre there's always:
[root at cag-dl145-172 collectl]# ./collectl.pl -s l -OB
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#Rds  RdK   1P   2P   4P   8P  16P  32P  64P 128P 256P Wrts WrtK   1P   
2P   4P   8P  16P  32P  64P 128P 256P
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    
0    0    0    0    0    0    0    0

I haven't had too much feedback on collectl and am always looking for some.
btw - there are a lot more options than I just showed you and if you 
like timestamps, just append -oT to the commands.

that should give you a pretty good start...  8-)

-mark

Dan wrote:
> Hi,
>
> With help from Oleg we got the right patches applied and NFS working 
> well.  Maximum performance was about 60 MB/sec.  Last week that 
> dropped to about 12.5 MB/sec and I cannot find a reason.  Lustre 
> clients all obtain 100+ MB/sec on GigE.  Each OST is good for 270 
> MB/sec.  When mounting the client on one of the OSSs I get 230 
> MB/sec.  Seems the speed is there.  How can NFS and Lustre be tuned 
> better?
>
> Current config for 1.6.4.3 is below:
>
> 1.  MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64
> 2.  OSS w/ 6 OSTs - ost_max_num_threads=64
> 3.  20 Lustre clients - all perform well (GREAT Lustre developers!!!! 
> this system is amazing!)
> 4.  NFS server runs from a Lustre client machine for 12 to 15 MB/sec max.
> 5.  NFS server from the MGS (client on MGS/OSS = bad, I know!) can get 
> 20 to 30 MB/sec
>     - this got 60+ MB/sec in the past.
>
> bugs and patches applied:
>
> 14360 - 14006 is the only patch
> 14379 - patch 14007 only since 14008 is reversed by 14591
> 13371 - bug for the above mentioned 14591 patch
>
> With these patches the system is stable unless I bump the OST or MGS 
> threads too high.  Performance doesn't seem to change much with any 
> tuning.  I've adjusted the client via /proc and the OSTs and MGS via 
> /etc/modprobe.conf.
>
> Suggestions?
>
> Thank you,
>
> Dan
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list