[Lustre-discuss] High Load and high system CPU for mds

Isaac Huang He.Huang at Sun.COM
Mon Mar 1 22:09:07 PST 2010


On Mon, Mar 01, 2010 at 02:35:18PM -0500, Oleg Drokin wrote:
> Hello!
> 
> On Feb 28, 2010, at 9:31 PM, huangql wrote:
> > We got a problem that the MDS has high load value and the system CPU is up to 60% when running chown command on client. It's strange that the load value and system CPU didn't decrease to the normal level as long as it getted high. Even we can't do anything on clients and OSS. You can see the information with top command as follows:
> 
> How many files did that chown command affected (was it a chown -R for some huge directory tree?).
> Essentially chown (setattr) works in two steps, first it changes MDS attributes then it queues an async RPC for
> every file object to update the attributes on OST. If there are many files that are getting updated this way,
> there would be a lot of such messages queued and all the messages are sent at once with no rate limiting.
> Thisis consistent with what you are seeing here, ptlrpcd is busy sending/receiving RPCs (ptlrpcd is lustre
> thread that handles async RPCs sending/completion) and individual socklnd threads are also busy processing

For small messages, the socklnd can't do zero-copy sends, so there's
an additional cost for copying the small messages into socket send
buffers, which adds to CPU usage.

A show_cpu/show_processes from SysRQ should tell what the processes
are being busy with..

> network transfers (also I think the code in lnet is not tuned to process huge amounts of outstanding RPCs
> which leads to additional CPU overhead in that case).

Yes:
https://bugzilla.lustre.org/show_bug.cgi?id=21619

Isaac



More information about the lustre-discuss mailing list