[Lustre-discuss] NFS Performance
Mark Seger
Mark.Seger at hp.com
Tue Apr 15 13:45:38 PDT 2008
> Thanks Mark! I just started using collectl last week. I'll investigate the options you suggested in a minutes and see.
>
by all means do so and if you have any problems with the switches - you
don't want to know how much extra code there is in collectl just to deal
with all the lustre stats - just let me know. Also be sure to let me
know if you encounter any operational problems too. I'd only recently
gotten around to adding support for 1.6.4 and while I think it all works
the proof is in trying it in a lot of different configurations/environments.
-mark
> Dan
>
>
> -----Original Message-----
> From: Mark Seger <Mark.Seger at hp.com>
> Date: Tuesday, Apr 15, 2008 12:39 pm
> Subject: Re: [Lustre-discuss] NFS Performance
> To: Dan <dan at nerp.net>
> CC: Lustre-discuss at lists.lustre.org
>
> while I can't tell you how to tune nfs, I can tell you how to monitor it. With collectl - http://collectl.sourceforge.net/ - you should be able to watch nfs, lustre and your network all at once, maybe even toss in cpu for good measure
>
> This is an example of the output (along with the appropriate switches). I'm not doing anything over nfs, so those fields are all zero.
>
> [root at cag-dl145-172 ~]# collectl -scnfl
> waiting for 1 second sample...
> #<--------CPU--------><-----------Network----------><--NFS Svr Summary--><-------Lustre Client->
> #cpu sys inter ctxsw netKBi pkt-in netKBo pkt-out read write calls Reads KBRead Writes Ke
> 0 0 11335 33 2301 33665 2301 33665 0 0 0 0 0 0 0
> 0 0 11377 59 2303 33693 2303 33690 0 0 0 0 0 0 0
> 0 0 11362 29 2305 33719 2305 33721 0 0 0 0 0 0 0
>
> there are lots of different options you can try, but again I'm not sure what to look for. changing the 'f' to 'F' lets you did a little deeper and looks at the metadata ops, commits, and restrans.
> [root at cag-dl145-172 ~]# collectl -scnFl
> #<--------CPU--------><-----------Network----------><----NFS MetaOps----><-------Lustre Client->
> #cpu sys inter ctxsw netKBi pkt-in netKBo pkt-out meta commit retran Reads KBRead Writes Ke
> 0 0 121 43 0 4 0 2 0 0 0 0 0 0 0
> 0 0 146 143 0 2 0 3 0 0 0 0 0 0 0
>
> if you really want to see everything nfsstat might show there's two more formats based on the case of the 'f':
> [root at cag-dl145-172 ~]# collectl -sf --verbose
> # NFS SERVER (/sec)
> #<----------Network-------><----------RPC---------><---NFS V3--->
> #PKTS UDP TCP TCPCONN CALLS BADAUTH BADCLNT READ WRITE
> 0 0 0 0 0 0 0 0 0
>
> any my favorite when I haven't a clue what nfs is doing:
> [root at cag-dl145-172 ~]# collectl -sF --verbose
> # NFS V3 SERVER (/sec)
> #NULL GETA SETA LOOK ACCS RLNK READ WRIT CRE8 MKDR SYML MKND RMOV RMDR RENM LINK RDIR RDR+ FSTA FINF PATH COMM
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> on the other hand if you want to see the size of the rpcs bucket sizes being received from lustre there's always:
> [root at cag-dl145-172 collectl]# ./collectl.pl -s l -OB
> # LUSTRE CLIENT SUMMARY: RPC-BUFFERS (pages)
> #Rds RdK 1P 2P 4P 8P 16P 32P 64P 128P 256P Wrts WrtK 1P 2P 4P 8P 16P 32P 64P 128P 256P
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
> I haven't had too much feedback on collectl and am always looking for some. btw - there are a lot more options than I just showed you and if you like timestamps, just append -oT to the commands.
>
> that should give you a pretty good start... 8-)
>
> -mark
>
> Dan wrote:
> Hi,
>
>
>> With help from Oleg we got the right patches applied and NFS working
>>
> well. Maximum performance was about 60 MB/sec. Last week that
> dropped to about 12.5 MB/sec and I cannot find a reason. Lustre
> clients all obtain 100+ MB/sec on GigE. Each OST is good for 270
> MB/sec. When mounting the client on one of the OSSs I get 230
> MB/sec. Seems the speed is there. How can NFS and Lustre be tuned
> better?
>
>
>> Current config for 1.6.4.3 is below:
>>
>
>
>> 1. MGS/OSS w/ 4 OSTs - mgs_max_num_threads=32, ost_max_num_threads=64
>>
> 2. OSS w/ 6 OSTs - ost_max_num_threads=64
> 3. 20 Lustre clients - all perform well (GREAT Lustre developers!!!!
> this system is amazing!)
> 4. NFS server runs from a Lustre client machine for 12 to 15 MB/sec max.
> 5. NFS server from the MGS (client on MGS/OSS = bad, I know!) can get
> 20 to 30 MB/sec
> - this got 60+ MB/sec in the past.
>
>
>> bugs and patches applied:
>>
>
>
>> 14360 - 14006 is the only patch
>>
> 14379 - patch 14007 only since 14008 is reversed by 14591
> 13371 - bug for the above mentioned 14591 patch
>
>
>> With these patches the system is stable unless I bump the OST or MGS
>>
> threads too high. Performance doesn't seem to change much with any
> tuning. I've adjusted the client via /proc and the OSTs and MGS via
> /etc/modprobe.conf.
>
>
>> Suggestions?
>>
>
>
>> Thank you,
>>
>
>
>> Dan
>>
> ------------------------------------------------------------------------
>
>
>> _______________________________________________
>>
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list