[lustre-discuss] Tuning for metadata performance

Michael Di Domenico mdidomenico4 at gmail.com
Tue Jan 12 11:40:34 PST 2021


yes more or less.  i know on the lustre server side i can see the MDT
operations, which I believe you can grab on the clients as well.
which I believe is also what slurm is already telling you in the job
stats you grep'ed.  i suspect it will be, but it would be interesting
to see if 'strace -c' shows the same number of RPC between lustre and
nfs

maybe based on the operations count, the lustre folks can suggest more
specific areas to optimize the filesystem


On Tue, Jan 12, 2021 at 1:55 PM Vicker, Darby J. (JSC-EG111)[Jacobs
Technology, Inc.] <darby.vicker-1 at nasa.gov> wrote:
>
> No, I haven't.  By what means do you suggest analyzing the OP calls?  Just an strace? Or the server-size debug commands as outlined in https://doc.lustre.org/lustre_manual.xhtml#dbdoclet.50438274_62472 ?
>
> We also have jobstats enabled and are outputting these to a file for later analysis.  So if I submitted this test in a slurm job, I'd get stats like:
>
> $ grep MDT /aerolab/admin/slurm/19435076_qsmoore
>         MDT:snapshot_time      : 2021-01-09 05:29:59
>         MDT:setattr            : 110
>         MDT:getattr            : 20908
>         MDT:mkdir              : 11
>         MDT:getxattr           : 20424
>         MDT:mknod              : 48
>         MDT:close              : 19829
>         MDT:unlink             : 9
>         MDT:open               : 20188
> $
>
> But you must be referring to an external tool like strace so I could do the same thing on both lustre and NFS.
>
> -----Original Message-----
> From: Michael Di Domenico <mdidomenico4 at gmail.com>
> Date: Tuesday, January 12, 2021 at 10:48 AM
> To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" <darby.vicker-1 at nasa.gov>
> Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> Subject: Re: [EXTERNAL] Re: [lustre-discuss] Tuning for metadata performance
>
>     have you run any analysis on the "A clone of these repo takes 550
>     seconds on lustre", where you track the exact OP calls on lustre to
>     see if it's a general slowness or if there is a specific OP that git
>     is abusing?  i wonder if there's something specific that git is doing
>     that lustre is unhappy with versus continuing to poke at the hardware
>     or software tuning.
>
>     thought less likely, i'd also be curious if you have any
>     security/audit controls turned on on the clients.  i have some silly
>     ones where i'm at that slow things down on lustre but not nfs because
>     of how the kernel treats the filesystem
>
>     i don't have any git repo's even close to that size so i can't perform
>     the same analysis where i'm at.
>
>
>     On Mon, Jan 11, 2021 at 1:45 PM Vicker, Darby J. (JSC-EG111)[Jacobs
>     Technology, Inc.] <darby.vicker-1 at nasa.gov> wrote:
>     >
>     > Sure.  Its a custom configuration on commodity hardware, which is quite a bit newer than the luster servers.  The overall setup is a bit complicated to support HA - two servers with an external JBOD with ZFS to manage the drives and the file system.  PCS to do the failover.  But none of that is too relevant in terms of performance so here are the hardware specs.
>     >
>     > Servers:
>     > 192 GB DDR4 2666 MHz ECC Memory
>     > 16 total physical cores (2x Intel Xeon Gold 6144 CPU @ 3.50GHz)
>     > LSI SAS Card (can't find exact model but very similar to the cards in the lustre servers)
>     >
>     > JBOD:
>     > Supermicro 3.5"
>     > 24x 10TB 7200 RPM Seagate HDD's
>     >
>     > ZFS is used to configure the drives in a RAID10 with a zfs file system built on the zpool.  This is exported via NFS.  The only NFS tuning we are doing is to increase RPCNFSDCOUNT to 128 and export with async.
>     >
>     > So the HW configuration is overall fairly similar.   This is another reason I'm hopeful that we'd be able to get our lustre MD performance as good or better than the NFS server given that the lustre MDS has SSD's and the NFS server has HDD's.
>     >
>     >
>     > -----Original Message-----
>     > From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Michael Di Domenico <mdidomenico4 at gmail.com>
>     > Date: Monday, January 11, 2021 at 8:07 AM
>     > Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
>     > Subject: [EXTERNAL] Re: [lustre-discuss] Tuning for metadata performance
>     >
>     > perhaps i missed it somewhere, but in order to do a fair comparison
>     > can you detail the hardware/software behind the nfs server?
>     >
>     >
>


More information about the lustre-discuss mailing list