[Lustre-discuss] NFS Performance

Dan Redig dan at nerp.net
Tue Apr 15 13:06:00 PDT 2008


Thanks Mark!  I just started using collectl last week.  I'll investigate the options you suggested in a minutes and see.

Dan


-----Original Message-----
From: Mark Seger <Mark.Seger at hp.com>
Date: Tuesday, Apr 15, 2008 12:39 pm
Subject: Re: [Lustre-discuss] NFS Performance
To: Dan <dan at nerp.net>
CC: Lustre-discuss at lists.lustre.org

while I can't tell you how to tune nfs, I can tell you how to monitor it.  With collectl - http://collectl.sourceforge.net/ - you should be able to watch nfs, lustre and your network all at once, maybe even toss in cpu for good measure

This is an example of the output (along with the appropriate switches).  I'm not doing anything over nfs, so those fields are all zero.

[root at cag-dl145-172 ~]# collectl -scnfl
waiting for 1 second sample...
#<--------CPU--------><-----------Network----------><--NFS Svr Summary--><-------Lustre Client->
#cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   read  write  calls  Reads KBRead Writes Ke
   0   0 11335     33   2301  33665    2301   33665      0      0      0      0      0      0  0
   0   0 11377     59   2303  33693    2303   33690      0      0      0      0      0      0  0
   0   0 11362     29   2305  33719    2305   33721      0      0      0      0      0      0  0

there are lots of different options you can try, but again I'm not sure what to look for.  changing the 'f' to 'F' lets you did a little deeper and looks at the metadata ops, commits, and restrans.
[root at cag-dl145-172 ~]# collectl -scnFl
#<--------CPU--------><-----------Network----------><----NFS MetaOps----><-------Lustre Client->
#cpu sys inter  ctxsw netKBi pkt-in  netKBo pkt-out   meta commit retran  Reads KBRead Writes Ke
   0   0   121     43      0      4       0       2      0      0      0      0      0      0  0
   0   0   146    143      0      2       0       3      0      0      0      0      0      0  0

if you really want to see everything nfsstat might show there's two more formats based on the case of the 'f':
[root at cag-dl145-172 ~]# collectl -sf --verbose
# NFS SERVER (/sec)
#<----------Network-------><----------RPC---------><---NFS V3--->
#PKTS   UDP   TCP  TCPCONN  CALLS  BADAUTH  BADCLNT   READ  WRITE
    0     0     0        0      0        0        0      0      0

any my favorite when I haven't a clue what nfs is doing:
[root at cag-dl145-172 ~]# collectl -sF --verbose
# NFS V3 SERVER (/sec)
#NULL GETA SETA LOOK ACCS RLNK READ WRIT CRE8 MKDR SYML MKND RMOV RMDR RENM LINK RDIR RDR+ FSTA FINF PATH COMM
    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

on the other hand if you want to see the size of the rpcs bucket sizes being received from lustre there's always:
[root at cag-dl145-172 collectl]# ./collectl.pl -s l -OB
# LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
#Rds  RdK   1P   2P   4P   8P  16P  32P  64P 128P 256P Wrts WrtK   1P   2P   4P   8P  16P  32P  64P 128P 256P
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

I haven't had too much feedback on collectl and am always looking for some. btw - there are a lot more options than I just showed you and if you like timestamps, just append -oT to the commands.

that should give you a pretty good start...  8-)

-mark

Dan wrote:
 Hi,

> With help from Oleg we got the right patches applied and NFS working 
 well.  Maximum performance was about 60 MB/sec.  Last week that 
 dropped to about 12.5 MB/sec and I cannot find a reason.  Lustre 
 clients all obtain 100+ MB/sec on GigE.  Each OST is good for 270 
 MB/sec.  When mounting the client on one of the OSSs I get 230 
 MB/sec.  Seems the speed is there.  How can NFS and Lustre be tuned 
 better?

> Current config for 1.6.4.3 is below:

> 1.  MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64
 2.  OSS w/ 6 OSTs - ost_max_num_threads=64
 3.  20 Lustre clients - all perform well (GREAT Lustre developers!!!! 
 this system is amazing!)
 4.  NFS server runs from a Lustre client machine for 12 to 15 MB/sec max.
 5.  NFS server from the MGS (client on MGS/OSS = bad, I know!) can get 
 20 to 30 MB/sec
     - this got 60+ MB/sec in the past.

> bugs and patches applied:

> 14360 - 14006 is the only patch
 14379 - patch 14007 only since 14008 is reversed by 14591
 13371 - bug for the above mentioned 14591 patch

> With these patches the system is stable unless I bump the OST or MGS 
 threads too high.  Performance doesn't seem to change much with any 
 tuning.  I've adjusted the client via /proc and the OSTs and MGS via 
 /etc/modprobe.conf.

> Suggestions?

> Thank you,

> Dan
 ------------------------------------------------------------------------

> _______________________________________________
 Lustre-discuss mailing list
 Lustre-discuss at lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
   




More information about the lustre-discuss mailing list