[lustre-discuss] Full OST

Alastair Basden a.g.basden at durham.ac.uk
Mon Sep 6 09:19:45 PDT 2021


Hi Aurélien,

Thanks.

Within O/1/d0 to O/1/d31, these are empty directories.
Within O/0/d0 to d31, these have some files in them.  However, of the ones 
I've tried, the
lfs fid2path /snap8 [0x1004e0000:0xe0:0x0]
returns e.g.
lfs fid2path: cannot find '[0x1004e0000:0xe0:0x0]': Invalid argument

where the fid comes from e.g.
debugfs:  stat O/0/d0/224
Inode: 1850   Type: regular    Mode:  07666   Flags: 0x80000
Generation: 2411783677    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 0
File ACL: 0
Links: 1   Blockcount: 0
Fragment:  Address: 0    Number: 0    Size: 0
  ctime: 0x00000000:00000000 -- Thu Jan  1 01:00:00 1970
  atime: 0x00000000:00000000 -- Thu Jan  1 01:00:00 1970
  mtime: 0x00000000:00000000 -- Thu Jan  1 01:00:00 1970
crtime: 0x6101806b:31543ab0 -- Wed Jul 28 17:06:03 2021
Size of extra inode fields: 32
Extended attributes:
   lma: fid=[0x1004e0000:0xe0:0x0] compat=8 incompat=0
EXTENTS:


The O/10 directory also only contains empty directories.

Some of the others do contain regular files, but for all that I've tried, 
the fid2path returns
lfs fid2path: cannot find '[0x23c0000401:0x260:0x0]': No such file or 
directory
or the Invalid argument message.

The size of the objects, as returned by stat, is also always 0, in the 
cases that I've seen (perhaps it is suppoed to be, I don't know!)

Cheers,
Alastair.


On Mon, 6 Sep 2021, Degremont, Aurelien wrote:

> [EXTERNAL EMAIL]
>
> Hi
>
>>    Not quite sure what you meant by the O/*/d* as there are no directories
>>   within O/, and there is no d/ or d*/ either at top level or within O/
>
> As you can confirm with the 'stat' output you provided, '23c0000400' is a directory and actually all other entries also are.
> Not straightforward but 2nd column is file type and permission: '4' means dir.
>
> I think Andreas is referring especially to directory '0', '1' and '10' is your output.
> Try looking into them, you should see multiple 'dXX' directories with objects in them.
>
> Aurélien
>
>
> Le 06/09/2021 10:12, « Alastair Basden » <a.g.basden at durham.ac.uk> a écrit :
>
>    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>    Hi Andreas,
>
>    Thanks.
>
>    With debugfs /dev/nvme6n1, I get:
>    debugfs:  ls -l O
>      393217   40755 (2)      0      0    4096 28-Jul-2021 17:06 .
>           2   40755 (2)      0      0    4096 28-Jul-2021 17:02 ..
>      393218   40755 (2)      0      0    4096 28-Jul-2021 17:02 200000003
>      524291   40755 (2)      0      0    4096 28-Jul-2021 17:02 1
>      655364   40755 (2)      0      0    4096 28-Jul-2021 17:02 10
>      786437   40755 (2)      0      0    4096 28-Jul-2021 17:06 0
>      917510   40755 (2)      0      0    4096 28-Jul-2021 17:06 23c0000402
>      1048583   40755 (2)      0      0    4096 28-Jul-2021 17:06 23c0000401
>      1179656   40755 (2)      0      0    4096 28-Jul-2021 17:06 23c0000400
>
>    Then e.g.:
>    debugfs:  stat O/23c0000400
>    Inode: 1179656   Type: directory    Mode:  0755   Flags: 0x80000
>    Generation: 2411782533    Version: 0x00000000:00000000
>    User:     0   Group:     0   Project:     0   Size: 4096
>    File ACL: 0
>    Links: 34   Blockcount: 8
>    Fragment:  Address: 0    Number: 0    Size: 0
>      ctime: 0x6101806b:306016bc -- Wed Jul 28 17:06:03 2021
>      atime: 0x6101806b:2d83aad8 -- Wed Jul 28 17:06:03 2021
>      mtime: 0x6101806b:306016bc -- Wed Jul 28 17:06:03 2021
>    crtime: 0x6101806b:2d83aad8 -- Wed Jul 28 17:06:03 2021
>    Size of extra inode fields: 32
>    Extended attributes:
>       lma: fid=[0x120008:0x8fc0e185:0x0] compat=c incompat=0
>    EXTENTS:
>    (0):33989
>
>
>    But then on a client:
>    lfs fid2path /snap8 [0x120008:0x8fc0e185:0x0]
>    lfs fid2path: cannot find '[0x120008:0x8fc0e185:0x0]': No such file or
>    directory
>
>    (and likewise for the others).
>
>    Not quite sure what you meant by the O/*/d* as there are no directories
>    within O/, and there is no d/ or d*/ either at top level or within O/
>
>
>    Running (on the OST):
>    lctl lfsck_start -M snap8-OST004e
>    seems to work (at least, doesn't return any error).
>
>    However, lctl lfsck_query -M snap8-OST004e   gives:
>    Fail to query LFSCK: Inappropriate ioctl for device
>
>
>    Thanks,
>    Alastair.
>
>
>    On Sat, 4 Sep 2021, Andreas Dilger wrote:
>
>    > [EXTERNAL EMAIL]
>    >
>    > You could run debugfs on that OST and use "ls -l" to examine the O/*/d* directories for large objects, then "stat" any suspicious objects within debugfs to dump the parent FID, and "lfs fid2path" on a client to determine the path.
>    >
>    > Alternately, see "lctl-lfsck-start.8" man page for options to link orphan objects to the .lustre/lost+found directory if you think there are no files referencing those objects.
>    >
>    > Cheers, Andreas
>    >
>    >> On Sep 4, 2021, at 00:54, Alastair Basden <a.g.basden at durham.ac.uk> wrote:
>    >>
>    >> Ah, of course - has to be done on a client.
>    >>
>    >> None of these files are on the dodgy OST.
>    >>
>    >> Any further suggestions?  Essentially we have what seems to be a full OST with nothing on it.
>    >>
>    >> Thanks,
>    >> Alastair.
>    >>
>    >>> On Sat, 4 Sep 2021, Andreas Dilger wrote:
>    >>>
>    >>> [EXTERNAL EMAIL]
>    >>> $ man lfs-fid2path.1
>    >>> lfs-fid2path(1)                                       user utilities                                     lfs-fid2path(1)
>    >>>
>    >>> NAME
>    >>>      lfs fid2path - print the pathname(s) for a file identifier
>    >>>
>    >>> SYNOPSIS
>    >>>      lfs fid2path [OPTION]... <FSNAME|MOUNT_POINT> <FID>...
>    >>>
>    >>> DESCRIPTION
>    >>>      lfs  fid2path  maps  a  numeric  Lustre File IDentifier (FID) to one or more pathnames
>    >>>      that have hard links to that file.  This allows resolving filenames for FIDs used in console
>    >>>      error messages, and resolving all of the pathnames for a file that has multiple hard links.
>    >>>      Pathnames are resolved relative to the MOUNT_POINT specified, or relative to the
>    >>>      filesystem mount point if FSNAME is provided.
>    >>>
>    >>> OPTIONS
>    >>>      -f, --print-fid
>    >>>             Print the FID with the path.
>    >>>
>    >>>      -c, --print-link
>    >>>             Print the current link number with each pathname or parent directory.
>    >>>
>    >>>      -l, --link=LINK
>    >>>             If a file has multiple hard links, then print only the specified LINK, starting at link 0.
>    >>>             If multiple FIDs are given, but only one pathname is needed for each file, use --link=0.
>    >>>
>    >>> EXAMPLES
>    >>>      $ lfs fid2path /mnt/testfs [0x200000403:0x11f:0x0]
>    >>>             /mnt/testfs/etc/hosts
>    >>>
>    >>>
>    >>> On Sep 3, 2021, at 14:51, Alastair Basden <a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk>> wrote:
>    >>>
>    >>> Hi,
>    >>>
>    >>> lctl get_param mdt.*.exports.*.open_files  returns:
>    >>> mdt.snap8-MDT0000.exports.172.18.180.21 at o2ib.open_files=
>    >>> [0x20000b90e:0x10aa:0x0]
>    >>> mdt.snap8-MDT0000.exports.172.18.180.22 at o2ib.open_files=
>    >>> [0x20000b90e:0x21b3:0x0]
>    >>> mdt.snap8-MDT0000.exports.172.18.181.19 at o2ib.open_files=
>    >>> [0x20000b90e:0x21b3:0x0]
>    >>> [0x20000b90e:0x21b4:0x0]
>    >>> [0x20000b90c:0x1574:0x0]
>    >>> [0x20000b90c:0x1575:0x0]
>    >>> [0x20000b90c:0x1576:0x0]
>    >>>
>    >>> Doesn't seem to be many open, so I don't think it's a problem of open files.
>    >>>
>    >>> Not sure which bit of this I need to use with lfs fid2path either...
>    >>>
>    >>> Cheers,
>    >>> Alastair.
>    >>>
>    >>>
>    >>> On Fri, 3 Sep 2021, Andreas Dilger wrote:
>    >>>
>    >>> [EXTERNAL EMAIL]
>    >>> You can also check "mdt.*.exports.*.open_files" on the MDTs for a list of FIDs open on each client, and use "lfs fid2path" to resolve them to a pathname.
>    >>>
>    >>> On Sep 3, 2021, at 02:09, Degremont, Aurelien via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>> wrote:
>    >>>
>    >>> Hi
>    >>>
>    >>> It could be a bug, but most of the time, this is due to an open-unlinked file, typically a log file which is still in use and some processes keep writing to it until it fills the OSTs it is using.
>    >>>
>    >>> Look for such files on your clients (use lsof).
>    >>>
>    >>> Aurélien
>    >>>
>    >>>
>    >>> Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden » <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org><mailto:lustre-discuss-bounces at lists.lustre.org> au nom de a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk><mailto:a.g.basden at durham.ac.uk>> a écrit :
>    >>>
>    >>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>    >>>
>    >>>
>    >>>
>    >>> Hi,
>    >>>
>    >>> We have a file system where each OST is a single SSD.
>    >>>
>    >>> One of those is reporting as 100% full (lfs df -h /snap8):
>    >>> snap8-OST004d_UUID          5.8T        2.0T        3.5T  37% /snap8[OST:77]
>    >>> snap8-OST004e_UUID          5.8T        5.5T        7.5G 100% /snap8[OST:78]
>    >>> snap8-OST004f_UUID          5.8T        2.0T        3.4T  38% /snap8[OST:79]
>    >>>
>    >>> However, I can't find any files on it:
>    >>> lfs find --ost snap8-OST004e /snap8/
>    >>> returns nothing.
>    >>>
>    >>> I guess that it has filled up, and that there is some bug or other that is
>    >>> now preventing proper behaviour - but I could be wrong.
>    >>>
>    >>> Does anyone have any suggestions?
>    >>>
>    >>> Essentially, I'd like to find some of the files and delete or migrate
>    >>> some, and thus return it to useful production.
>    >>>
>    >>> Cheers,
>    >>> Alastair.
>    >>> _______________________________________________
>    >>> lustre-discuss mailing list
>    >>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>    >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>    >>>
>    >>> _______________________________________________
>    >>> lustre-discuss mailing list
>    >>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>    >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>    >>>
>    >>> Cheers, Andreas
>    >>> --
>    >>> Andreas Dilger
>    >>> Lustre Principal Architect
>    >>> Whamcloud
>    >>>
>    >>>
>    >>>
>    >>>
>    >>>
>    >>>
>    >>>
>    >>>
>    >>> Cheers, Andreas
>    >>> --
>    >>> Andreas Dilger
>    >>> Lustre Principal Architect
>    >>> Whamcloud
>    >>>
>    >>>
>    >>>
>    >>>
>    >>>
>    >>>
>    >>>
>    >
>
>


More information about the lustre-discuss mailing list