[lustre-discuss] [EXTERNAL] Re: Tuning for metadata performance

Michael Di Domenico mdidomenico4 at gmail.com
Thu Jan 14 06:04:41 PST 2021


interesting.  thanks for the detail.

completely off the cuff, i could be totally wrong here, and i maybe
misremembering something, but i believe i recall reading somewhere,
when getxattr and getattr are called the MDT contacts the OSS's as
well to get some info.  if that's true, this double hop is probably
the extra wait time. the nfs server wouldn't have to do this, it just
checks the local disk.

if i'm not crazy and what i said has some truth, the one thing that
might show a differential is if you lock all the files in the repo to
a single OSS (you may have done this, i'm not sure)

Again I'm no lustre genius, just trying to help.  the lustre dev's
likely can explain better than i can.  i might be leading you down a
rabbit hole.

On Wed, Jan 13, 2021 at 6:49 PM Vicker, Darby J. (JSC-EG111)[Jacobs
Technology, Inc.] <darby.vicker-1 at nasa.gov> wrote:
>
> I updated my git timings to include a run on our lustre FS's with the stripe count set to 1.  Our current default is a 3 segment PFL.  This did help a bit.  Updated plot attached.
>
> I also did some runs on NFS and lustre (ldisk MDT, stripe count=1 if that makes a difference) with strace -c.  Results are attached and also below.  I also plotted the results.  I don't think there is anything earth shattering here - the git operations on lustre and NFS make almost the same number of calls to the same functions, the lustre version just spends (a lot) more time waiting.  But it was a good idea - thanks.
>
>
> $ cat strace.clone.scratch.out
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  66.76   21.693112     7231037         3           wait4
>  18.10    5.883054         367     16010           read
>   4.19    1.362333          32     41741      8059 open
>   3.85    1.251569           6    194499           write
>   3.23    1.048391          31     33635           fstat
>   2.71    0.879894          17     50473     45917 lstat
>   0.56    0.181915          22      7941         1 mkdir
>   0.39    0.126166           3     33706           close
>   0.10    0.033229           4      7944      7932 access
>   0.08    0.026800         558        48           munmap
>   0.01    0.004397          33       133           symlink
>   0.00    0.001168         292         4           clone
>   0.00    0.000408           4        99           mmap
>   0.00    0.000283           0       340           brk
>   0.00    0.000264           2        96        42 stat
>   0.00    0.000246           8        30           mprotect
>   0.00    0.000227           2       105           lseek
>   0.00    0.000220          15        14           rename
>   0.00    0.000219          12        18           getdents
>   0.00    0.000191          21         9           openat
>   0.00    0.000104           1        56         8 rt_sigreturn
>   0.00    0.000070           5        14        12 readlink
>   0.00    0.000058           3        17           rt_sigaction
>   0.00    0.000051          17         3           unlink
>   0.00    0.000046           5         8           pipe
>   0.00    0.000041           3        11           mremap
>   0.00    0.000031           6         5           chdir
>   0.00    0.000025          25         1           uname
>   0.00    0.000020          20         1           chmod
>   0.00    0.000015           3         5           getpid
>   0.00    0.000011           2         5           fcntl
>   0.00    0.000011           2         4           getcwd
>   0.00    0.000011           5         2         2 statfs
>   0.00    0.000009           2         4           clock_gettime
>   0.00    0.000008           4         2           ioctl
>   0.00    0.000006           3         2           getrlimit
>   0.00    0.000005           2         2           setitimer
>   0.00    0.000005           2         2           futex
>   0.00    0.000004           4         1           execve
>   0.00    0.000003           3         1           arch_prctl
>   0.00    0.000003           3         1           set_tid_address
>   0.00    0.000002           2         1           rt_sigprocmask
>   0.00    0.000002           2         1           set_robust_list
>   0.00    0.000000           0         1           fsync
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   32.494627                386998     61973 total
> $ cat strace.clone.ephemeral1s.out
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  71.86  119.077430    39692476         3           wait4
>  11.46   18.987659          97    194757           write
>   4.31    7.138347         445     16012           read
>   3.67    6.073564         145     41741      8059 open
>   3.11    5.146797         101     50473     45917 lstat
>   2.64    4.382265         130     33635           fstat
>   2.16    3.571558         105     33706           close
>   0.48    0.799008         100      7941         1 mkdir
>   0.28    0.466472          58      7945      7933 access
>   0.02    0.028136         586        48           munmap
>   0.01    0.023113         173       133           symlink
>   0.00    0.002379         169        14           rename
>   0.00    0.001785           5       312        17 rt_sigreturn
>   0.00    0.001416         354         4           clone
>   0.00    0.001132          11        96        42 stat
>   0.00    0.000824           8        99           mmap
>   0.00    0.000731          40        18           getdents
>   0.00    0.000710          50        14        12 readlink
>   0.00    0.000657         219         3           unlink
>   0.00    0.000538           1       338           brk
>   0.00    0.000465          51         9           openat
>   0.00    0.000352          11        30           mprotect
>   0.00    0.000244         244         1           chmod
>   0.00    0.000227           2       105           lseek
>   0.00    0.000081          10         8           pipe
>   0.00    0.000050           2        17           rt_sigaction
>   0.00    0.000044           4        11           mremap
>   0.00    0.000044          44         1           uname
>   0.00    0.000026           5         5           chdir
>   0.00    0.000021           4         5           fcntl
>   0.00    0.000016           8         2         2 statfs
>   0.00    0.000012           6         2           ioctl
>   0.00    0.000011           2         4           getcwd
>   0.00    0.000010          10         1           execve
>   0.00    0.000007           3         2           getrlimit
>   0.00    0.000005           2         2           setitimer
>   0.00    0.000005           1         5           getpid
>   0.00    0.000004           4         1           arch_prctl
>   0.00    0.000004           2         2           futex
>   0.00    0.000004           1         4           clock_gettime
>   0.00    0.000003           3         1           rt_sigprocmask
>   0.00    0.000003           3         1           set_tid_address
>   0.00    0.000003           3         1           set_robust_list
>   0.00    0.000000           0         1           fsync
> ------ ----------- ----------- --------- --------- ----------------
> 100.00  165.706162                387513     61983 total
> $ cat strace.status.scratch.out
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  61.47    0.688990          20     33734        13 lstat
>  18.53    0.207653          13     15574           getdents
>  13.33    0.149468          19      7787           openat
>   4.01    0.044946          16      2703      2247 open
>   1.99    0.022332           2      8243           close
>   0.13    0.001478           2       629           write
>   0.13    0.001442           2       483           read
>   0.11    0.001218          15        81           fstat
>   0.07    0.000831           6       121           munmap
>   0.07    0.000742           4       170           mmap
>   0.04    0.000426           5        85        68 stat
>   0.04    0.000420         420         1           rename
>   0.03    0.000354           1       212           brk
>   0.02    0.000261          10        26        13 access
>   0.02    0.000203           2        85           lseek
>   0.01    0.000068           2        29           mprotect
>   0.00    0.000020           4         5           chdir
>   0.00    0.000019           3         5           getcwd
>   0.00    0.000008           1         5           getpid
>   0.00    0.000007           3         2         2 statfs
>   0.00    0.000006           6         1           ioctl
>   0.00    0.000006           3         2           fcntl
>   0.00    0.000006           2         3         1 readlink
>   0.00    0.000005           1         4           clock_gettime
>   0.00    0.000004           2         2           getrlimit
>   0.00    0.000004           2         2           futex
>   0.00    0.000003           0         7           rt_sigaction
>   0.00    0.000001           1         1           rt_sigprocmask
>   0.00    0.000001           1         1           set_tid_address
>   0.00    0.000001           1         1           set_robust_list
>   0.00    0.000000           0        11           mremap
>   0.00    0.000000           0         1           execve
>   0.00    0.000000           0         1           arch_prctl
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    1.120923                 70017      2344 total
> $ cat strace.status.ephemeral1s.out
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  46.42    2.694706          79     33734        13 lstat
>  28.29    1.642311         105     15574           getdents
>  11.21    0.650799          83      7787           openat
>  10.53    0.611098          76      7952           close
>   3.07    0.178217          80      2209      2044 open
>   0.33    0.018913          30       629           write
>   0.06    0.003425          12       265           read
>   0.02    0.001401          16        85        68 stat
>   0.02    0.001223          15        81           fstat
>   0.01    0.000793          16        48           munmap
>   0.01    0.000749           7        97           mmap
>   0.01    0.000461           2       212           brk
>   0.01    0.000457          17        26        13 access
>   0.01    0.000320          11        29           mprotect
>   0.00    0.000249         249         1           rename
>   0.00    0.000230           2        85           lseek
>   0.00    0.000084           7        11           mremap
>   0.00    0.000039           7         5           chdir
>   0.00    0.000026           5         5           getcwd
>   0.00    0.000020           2         7           rt_sigaction
>   0.00    0.000008           4         2           getrlimit
>   0.00    0.000006           6         1           execve
>   0.00    0.000006           6         1           set_tid_address
>   0.00    0.000005           5         1           rt_sigprocmask
>   0.00    0.000005           5         1           ioctl
>   0.00    0.000005           2         2           futex
>   0.00    0.000005           5         1           set_robust_list
>   0.00    0.000004           2         2           fcntl
>   0.00    0.000004           1         3         1 readlink
>   0.00    0.000002           0         5           getpid
>   0.00    0.000000           0         2         2 statfs
>   0.00    0.000000           0         1           arch_prctl
>   0.00    0.000000           0         4           clock_gettime
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    5.805571                 68868      2141 total
> $
>
> -----Original Message-----
> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Michael Di Domenico <mdidomenico4 at gmail.com>
> Date: Tuesday, January 12, 2021 at 12:41 PM
> Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org>
> Subject: [EXTERNAL] Re: [lustre-discuss] Tuning for metadata performance
>
>     yes more or less.  i know on the lustre server side i can see the MDT
>     operations, which I believe you can grab on the clients as well.
>     which I believe is also what slurm is already telling you in the job
>     stats you grep'ed.  i suspect it will be, but it would be interesting
>     to see if 'strace -c' shows the same number of RPC between lustre and
>     nfs
>
>     maybe based on the operations count, the lustre folks can suggest more
>     specific areas to optimize the filesystem
>


More information about the lustre-discuss mailing list