[lustre-discuss] Full OST
Andreas Dilger
adilger at whamcloud.com
Sat Sep 4 16:21:59 PDT 2021
You could run debugfs on that OST and use "ls -l" to examine the O/*/d* directories for large objects, then "stat" any suspicious objects within debugfs to dump the parent FID, and "lfs fid2path" on a client to determine the path.
Alternately, see "lctl-lfsck-start.8" man page for options to link orphan objects to the .lustre/lost+found directory if you think there are no files referencing those objects.
Cheers, Andreas
> On Sep 4, 2021, at 00:54, Alastair Basden <a.g.basden at durham.ac.uk> wrote:
>
> Ah, of course - has to be done on a client.
>
> None of these files are on the dodgy OST.
>
> Any further suggestions? Essentially we have what seems to be a full OST with nothing on it.
>
> Thanks,
> Alastair.
>
>> On Sat, 4 Sep 2021, Andreas Dilger wrote:
>>
>> [EXTERNAL EMAIL]
>> $ man lfs-fid2path.1
>> lfs-fid2path(1) user utilities lfs-fid2path(1)
>>
>> NAME
>> lfs fid2path - print the pathname(s) for a file identifier
>>
>> SYNOPSIS
>> lfs fid2path [OPTION]... <FSNAME|MOUNT_POINT> <FID>...
>>
>> DESCRIPTION
>> lfs fid2path maps a numeric Lustre File IDentifier (FID) to one or more pathnames
>> that have hard links to that file. This allows resolving filenames for FIDs used in console
>> error messages, and resolving all of the pathnames for a file that has multiple hard links.
>> Pathnames are resolved relative to the MOUNT_POINT specified, or relative to the
>> filesystem mount point if FSNAME is provided.
>>
>> OPTIONS
>> -f, --print-fid
>> Print the FID with the path.
>>
>> -c, --print-link
>> Print the current link number with each pathname or parent directory.
>>
>> -l, --link=LINK
>> If a file has multiple hard links, then print only the specified LINK, starting at link 0.
>> If multiple FIDs are given, but only one pathname is needed for each file, use --link=0.
>>
>> EXAMPLES
>> $ lfs fid2path /mnt/testfs [0x200000403:0x11f:0x0]
>> /mnt/testfs/etc/hosts
>>
>>
>> On Sep 3, 2021, at 14:51, Alastair Basden <a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk>> wrote:
>>
>> Hi,
>>
>> lctl get_param mdt.*.exports.*.open_files returns:
>> mdt.snap8-MDT0000.exports.172.18.180.21 at o2ib.open_files=
>> [0x20000b90e:0x10aa:0x0]
>> mdt.snap8-MDT0000.exports.172.18.180.22 at o2ib.open_files=
>> [0x20000b90e:0x21b3:0x0]
>> mdt.snap8-MDT0000.exports.172.18.181.19 at o2ib.open_files=
>> [0x20000b90e:0x21b3:0x0]
>> [0x20000b90e:0x21b4:0x0]
>> [0x20000b90c:0x1574:0x0]
>> [0x20000b90c:0x1575:0x0]
>> [0x20000b90c:0x1576:0x0]
>>
>> Doesn't seem to be many open, so I don't think it's a problem of open files.
>>
>> Not sure which bit of this I need to use with lfs fid2path either...
>>
>> Cheers,
>> Alastair.
>>
>>
>> On Fri, 3 Sep 2021, Andreas Dilger wrote:
>>
>> [EXTERNAL EMAIL]
>> You can also check "mdt.*.exports.*.open_files" on the MDTs for a list of FIDs open on each client, and use "lfs fid2path" to resolve them to a pathname.
>>
>> On Sep 3, 2021, at 02:09, Degremont, Aurelien via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>> wrote:
>>
>> Hi
>>
>> It could be a bug, but most of the time, this is due to an open-unlinked file, typically a log file which is still in use and some processes keep writing to it until it fills the OSTs it is using.
>>
>> Look for such files on your clients (use lsof).
>>
>> Aurélien
>>
>>
>> Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden » <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org><mailto:lustre-discuss-bounces at lists.lustre.org> au nom de a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk><mailto:a.g.basden at durham.ac.uk>> a écrit :
>>
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> Hi,
>>
>> We have a file system where each OST is a single SSD.
>>
>> One of those is reporting as 100% full (lfs df -h /snap8):
>> snap8-OST004d_UUID 5.8T 2.0T 3.5T 37% /snap8[OST:77]
>> snap8-OST004e_UUID 5.8T 5.5T 7.5G 100% /snap8[OST:78]
>> snap8-OST004f_UUID 5.8T 2.0T 3.4T 38% /snap8[OST:79]
>>
>> However, I can't find any files on it:
>> lfs find --ost snap8-OST004e /snap8/
>> returns nothing.
>>
>> I guess that it has filled up, and that there is some bug or other that is
>> now preventing proper behaviour - but I could be wrong.
>>
>> Does anyone have any suggestions?
>>
>> Essentially, I'd like to find some of the files and delete or migrate
>> some, and thus return it to useful production.
>>
>> Cheers,
>> Alastair.
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Principal Architect
>> Whamcloud
>>
>>
>>
>>
>>
>>
>>
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Principal Architect
>> Whamcloud
>>
>>
>>
>>
>>
>>
>>
More information about the lustre-discuss
mailing list