[lustre-discuss] Full OST

Andreas Dilger adilger at whamcloud.com
Sat Sep 4 16:21:59 PDT 2021


You could run debugfs on that OST and use "ls -l" to examine the O/*/d* directories for large objects, then "stat" any suspicious objects within debugfs to dump the parent FID, and "lfs fid2path" on a client to determine the path. 

Alternately, see "lctl-lfsck-start.8" man page for options to link orphan objects to the .lustre/lost+found directory if you think there are no files referencing those objects. 

Cheers, Andreas

> On Sep 4, 2021, at 00:54, Alastair Basden <a.g.basden at durham.ac.uk> wrote:
> 
> Ah, of course - has to be done on a client.
> 
> None of these files are on the dodgy OST.
> 
> Any further suggestions?  Essentially we have what seems to be a full OST with nothing on it.
> 
> Thanks,
> Alastair.
> 
>> On Sat, 4 Sep 2021, Andreas Dilger wrote:
>> 
>> [EXTERNAL EMAIL]
>> $ man lfs-fid2path.1
>> lfs-fid2path(1)                                       user utilities                                     lfs-fid2path(1)
>> 
>> NAME
>>      lfs fid2path - print the pathname(s) for a file identifier
>> 
>> SYNOPSIS
>>      lfs fid2path [OPTION]... <FSNAME|MOUNT_POINT> <FID>...
>> 
>> DESCRIPTION
>>      lfs  fid2path  maps  a  numeric  Lustre File IDentifier (FID) to one or more pathnames
>>      that have hard links to that file.  This allows resolving filenames for FIDs used in console
>>      error messages, and resolving all of the pathnames for a file that has multiple hard links.
>>      Pathnames are resolved relative to the MOUNT_POINT specified, or relative to the
>>      filesystem mount point if FSNAME is provided.
>> 
>> OPTIONS
>>      -f, --print-fid
>>             Print the FID with the path.
>> 
>>      -c, --print-link
>>             Print the current link number with each pathname or parent directory.
>> 
>>      -l, --link=LINK
>>             If a file has multiple hard links, then print only the specified LINK, starting at link 0.
>>             If multiple FIDs are given, but only one pathname is needed for each file, use --link=0.
>> 
>> EXAMPLES
>>      $ lfs fid2path /mnt/testfs [0x200000403:0x11f:0x0]
>>             /mnt/testfs/etc/hosts
>> 
>> 
>> On Sep 3, 2021, at 14:51, Alastair Basden <a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk>> wrote:
>> 
>> Hi,
>> 
>> lctl get_param mdt.*.exports.*.open_files  returns:
>> mdt.snap8-MDT0000.exports.172.18.180.21 at o2ib.open_files=
>> [0x20000b90e:0x10aa:0x0]
>> mdt.snap8-MDT0000.exports.172.18.180.22 at o2ib.open_files=
>> [0x20000b90e:0x21b3:0x0]
>> mdt.snap8-MDT0000.exports.172.18.181.19 at o2ib.open_files=
>> [0x20000b90e:0x21b3:0x0]
>> [0x20000b90e:0x21b4:0x0]
>> [0x20000b90c:0x1574:0x0]
>> [0x20000b90c:0x1575:0x0]
>> [0x20000b90c:0x1576:0x0]
>> 
>> Doesn't seem to be many open, so I don't think it's a problem of open files.
>> 
>> Not sure which bit of this I need to use with lfs fid2path either...
>> 
>> Cheers,
>> Alastair.
>> 
>> 
>> On Fri, 3 Sep 2021, Andreas Dilger wrote:
>> 
>> [EXTERNAL EMAIL]
>> You can also check "mdt.*.exports.*.open_files" on the MDTs for a list of FIDs open on each client, and use "lfs fid2path" to resolve them to a pathname.
>> 
>> On Sep 3, 2021, at 02:09, Degremont, Aurelien via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>> wrote:
>> 
>> Hi
>> 
>> It could be a bug, but most of the time, this is due to an open-unlinked file, typically a log file which is still in use and some processes keep writing to it until it fills the OSTs it is using.
>> 
>> Look for such files on your clients (use lsof).
>> 
>> Aurélien
>> 
>> 
>> Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden » <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org><mailto:lustre-discuss-bounces at lists.lustre.org> au nom de a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk><mailto:a.g.basden at durham.ac.uk>> a écrit :
>> 
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>> 
>> 
>> 
>> Hi,
>> 
>> We have a file system where each OST is a single SSD.
>> 
>> One of those is reporting as 100% full (lfs df -h /snap8):
>> snap8-OST004d_UUID          5.8T        2.0T        3.5T  37% /snap8[OST:77]
>> snap8-OST004e_UUID          5.8T        5.5T        7.5G 100% /snap8[OST:78]
>> snap8-OST004f_UUID          5.8T        2.0T        3.4T  38% /snap8[OST:79]
>> 
>> However, I can't find any files on it:
>> lfs find --ost snap8-OST004e /snap8/
>> returns nothing.
>> 
>> I guess that it has filled up, and that there is some bug or other that is
>> now preventing proper behaviour - but I could be wrong.
>> 
>> Does anyone have any suggestions?
>> 
>> Essentially, I'd like to find some of the files and delete or migrate
>> some, and thus return it to useful production.
>> 
>> Cheers,
>> Alastair.
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Principal Architect
>> Whamcloud
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Principal Architect
>> Whamcloud
>> 
>> 
>> 
>> 
>> 
>> 
>> 


More information about the lustre-discuss mailing list