[lustre-discuss] Full OST

Alastair Basden a.g.basden at durham.ac.uk
Mon Sep 6 01:11:02 PDT 2021


Hi Andreas,

Thanks.

With debugfs /dev/nvme6n1, I get:
debugfs:  ls -l O
  393217   40755 (2)      0      0    4096 28-Jul-2021 17:06 .
       2   40755 (2)      0      0    4096 28-Jul-2021 17:02 ..
  393218   40755 (2)      0      0    4096 28-Jul-2021 17:02 200000003
  524291   40755 (2)      0      0    4096 28-Jul-2021 17:02 1
  655364   40755 (2)      0      0    4096 28-Jul-2021 17:02 10
  786437   40755 (2)      0      0    4096 28-Jul-2021 17:06 0
  917510   40755 (2)      0      0    4096 28-Jul-2021 17:06 23c0000402
  1048583   40755 (2)      0      0    4096 28-Jul-2021 17:06 23c0000401
  1179656   40755 (2)      0      0    4096 28-Jul-2021 17:06 23c0000400

Then e.g.:
debugfs:  stat O/23c0000400
Inode: 1179656   Type: directory    Mode:  0755   Flags: 0x80000
Generation: 2411782533    Version: 0x00000000:00000000
User:     0   Group:     0   Project:     0   Size: 4096
File ACL: 0
Links: 34   Blockcount: 8
Fragment:  Address: 0    Number: 0    Size: 0
  ctime: 0x6101806b:306016bc -- Wed Jul 28 17:06:03 2021
  atime: 0x6101806b:2d83aad8 -- Wed Jul 28 17:06:03 2021
  mtime: 0x6101806b:306016bc -- Wed Jul 28 17:06:03 2021
crtime: 0x6101806b:2d83aad8 -- Wed Jul 28 17:06:03 2021
Size of extra inode fields: 32
Extended attributes:
   lma: fid=[0x120008:0x8fc0e185:0x0] compat=c incompat=0
EXTENTS:
(0):33989


But then on a client:
lfs fid2path /snap8 [0x120008:0x8fc0e185:0x0]
lfs fid2path: cannot find '[0x120008:0x8fc0e185:0x0]': No such file or 
directory

(and likewise for the others).

Not quite sure what you meant by the O/*/d* as there are no directories 
within O/, and there is no d/ or d*/ either at top level or within O/


Running (on the OST):
lctl lfsck_start -M snap8-OST004e
seems to work (at least, doesn't return any error).

However, lctl lfsck_query -M snap8-OST004e   gives:
Fail to query LFSCK: Inappropriate ioctl for device


Thanks,
Alastair.


On Sat, 4 Sep 2021, Andreas Dilger wrote:

> [EXTERNAL EMAIL]
>
> You could run debugfs on that OST and use "ls -l" to examine the O/*/d* directories for large objects, then "stat" any suspicious objects within debugfs to dump the parent FID, and "lfs fid2path" on a client to determine the path.
>
> Alternately, see "lctl-lfsck-start.8" man page for options to link orphan objects to the .lustre/lost+found directory if you think there are no files referencing those objects.
>
> Cheers, Andreas
>
>> On Sep 4, 2021, at 00:54, Alastair Basden <a.g.basden at durham.ac.uk> wrote:
>>
>> Ah, of course - has to be done on a client.
>>
>> None of these files are on the dodgy OST.
>>
>> Any further suggestions?  Essentially we have what seems to be a full OST with nothing on it.
>>
>> Thanks,
>> Alastair.
>>
>>> On Sat, 4 Sep 2021, Andreas Dilger wrote:
>>>
>>> [EXTERNAL EMAIL]
>>> $ man lfs-fid2path.1
>>> lfs-fid2path(1)                                       user utilities                                     lfs-fid2path(1)
>>>
>>> NAME
>>>      lfs fid2path - print the pathname(s) for a file identifier
>>>
>>> SYNOPSIS
>>>      lfs fid2path [OPTION]... <FSNAME|MOUNT_POINT> <FID>...
>>>
>>> DESCRIPTION
>>>      lfs  fid2path  maps  a  numeric  Lustre File IDentifier (FID) to one or more pathnames
>>>      that have hard links to that file.  This allows resolving filenames for FIDs used in console
>>>      error messages, and resolving all of the pathnames for a file that has multiple hard links.
>>>      Pathnames are resolved relative to the MOUNT_POINT specified, or relative to the
>>>      filesystem mount point if FSNAME is provided.
>>>
>>> OPTIONS
>>>      -f, --print-fid
>>>             Print the FID with the path.
>>>
>>>      -c, --print-link
>>>             Print the current link number with each pathname or parent directory.
>>>
>>>      -l, --link=LINK
>>>             If a file has multiple hard links, then print only the specified LINK, starting at link 0.
>>>             If multiple FIDs are given, but only one pathname is needed for each file, use --link=0.
>>>
>>> EXAMPLES
>>>      $ lfs fid2path /mnt/testfs [0x200000403:0x11f:0x0]
>>>             /mnt/testfs/etc/hosts
>>>
>>>
>>> On Sep 3, 2021, at 14:51, Alastair Basden <a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk>> wrote:
>>>
>>> Hi,
>>>
>>> lctl get_param mdt.*.exports.*.open_files  returns:
>>> mdt.snap8-MDT0000.exports.172.18.180.21 at o2ib.open_files=
>>> [0x20000b90e:0x10aa:0x0]
>>> mdt.snap8-MDT0000.exports.172.18.180.22 at o2ib.open_files=
>>> [0x20000b90e:0x21b3:0x0]
>>> mdt.snap8-MDT0000.exports.172.18.181.19 at o2ib.open_files=
>>> [0x20000b90e:0x21b3:0x0]
>>> [0x20000b90e:0x21b4:0x0]
>>> [0x20000b90c:0x1574:0x0]
>>> [0x20000b90c:0x1575:0x0]
>>> [0x20000b90c:0x1576:0x0]
>>>
>>> Doesn't seem to be many open, so I don't think it's a problem of open files.
>>>
>>> Not sure which bit of this I need to use with lfs fid2path either...
>>>
>>> Cheers,
>>> Alastair.
>>>
>>>
>>> On Fri, 3 Sep 2021, Andreas Dilger wrote:
>>>
>>> [EXTERNAL EMAIL]
>>> You can also check "mdt.*.exports.*.open_files" on the MDTs for a list of FIDs open on each client, and use "lfs fid2path" to resolve them to a pathname.
>>>
>>> On Sep 3, 2021, at 02:09, Degremont, Aurelien via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>> wrote:
>>>
>>> Hi
>>>
>>> It could be a bug, but most of the time, this is due to an open-unlinked file, typically a log file which is still in use and some processes keep writing to it until it fills the OSTs it is using.
>>>
>>> Look for such files on your clients (use lsof).
>>>
>>> Aurélien
>>>
>>>
>>> Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden » <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org><mailto:lustre-discuss-bounces at lists.lustre.org> au nom de a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk><mailto:a.g.basden at durham.ac.uk>> a écrit :
>>>
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> Hi,
>>>
>>> We have a file system where each OST is a single SSD.
>>>
>>> One of those is reporting as 100% full (lfs df -h /snap8):
>>> snap8-OST004d_UUID          5.8T        2.0T        3.5T  37% /snap8[OST:77]
>>> snap8-OST004e_UUID          5.8T        5.5T        7.5G 100% /snap8[OST:78]
>>> snap8-OST004f_UUID          5.8T        2.0T        3.4T  38% /snap8[OST:79]
>>>
>>> However, I can't find any files on it:
>>> lfs find --ost snap8-OST004e /snap8/
>>> returns nothing.
>>>
>>> I guess that it has filled up, and that there is some bug or other that is
>>> now preventing proper behaviour - but I could be wrong.
>>>
>>> Does anyone have any suggestions?
>>>
>>> Essentially, I'd like to find some of the files and delete or migrate
>>> some, and thus return it to useful production.
>>>
>>> Cheers,
>>> Alastair.
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Principal Architect
>>> Whamcloud
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Principal Architect
>>> Whamcloud
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>


More information about the lustre-discuss mailing list