[Lustre-discuss] Poor metadata operation performance

Ken Hornstein kenh at cmf.nrl.navy.mil
Fri May 20 06:49:58 PDT 2011


So I guess there are some things I _still_ don't understand about Lustre
metadata handling.  Specifically, what metadata gets stored on OSTs and
why.

What brings this all up is that a) we have users who have lots of files
and b) we recently are doing through some reorganization that requires
changing the groups on lots of these files (this is all running Lustre
1.8.4; we're due for an upgrade in the medium future).

I figured okay, this wouldn't be so bad, since those are all metadata
server operations.  But I started running some tests, and I found out
that chown() system calls perform poorly.

Because I was doing some previous metadata performance analysis, I took
a souce code tree which consists of approximately 50,000 files and put
two copies in one of our Lustre filesystems: one with the default striping
(across all OSTs) and one where all files have no striping at all.  The
performance between these two trees for stat() calls is large, as you
can imagine, but the disparity between the chown() calls is even larger.
You can run chgrp on all of the files in the no-striped copy in about
3-5 seconds, but the striped copy takes more than 50 seconds.

I did some more digging as to why this is.  I thought maybe at first that
this is an issue on the client, but there is code in there that skips
over talking to the OSTs for certain types of metadata updates, and turning
on debugging on the client verifies that no setattr RPCs are being sent
to the OSSes.  Looking more closely at the RPC traces reveals that the issue
is on the metadata server; the setattr RPCs simply take longer when the
files are striped.

I've looked at the metadata server code for a bit, and I've verified
that the metadata server does send setattr RPCs to the OSSes, but I see
that it's done asynchronously; it shouldn't be waiting for the
replies.  So I'm stumped as to why this is happening.  I also realize
that I'm still puzzled as to what metadata is stored on the OSTs; it seems
like the client prefers the metadata from the MDS (except of course for
size), but a fair amount of metadata is still stored on the OSSes.  Can
anyone shed some light on this?

--Ken



More information about the lustre-discuss mailing list