[lustre-discuss] Full OST
Alastair Basden
a.g.basden at durham.ac.uk
Mon Sep 6 01:11:02 PDT 2021
Hi Andreas,
Thanks.
With debugfs /dev/nvme6n1, I get:
debugfs: ls -l O
393217 40755 (2) 0 0 4096 28-Jul-2021 17:06 .
2 40755 (2) 0 0 4096 28-Jul-2021 17:02 ..
393218 40755 (2) 0 0 4096 28-Jul-2021 17:02 200000003
524291 40755 (2) 0 0 4096 28-Jul-2021 17:02 1
655364 40755 (2) 0 0 4096 28-Jul-2021 17:02 10
786437 40755 (2) 0 0 4096 28-Jul-2021 17:06 0
917510 40755 (2) 0 0 4096 28-Jul-2021 17:06 23c0000402
1048583 40755 (2) 0 0 4096 28-Jul-2021 17:06 23c0000401
1179656 40755 (2) 0 0 4096 28-Jul-2021 17:06 23c0000400
Then e.g.:
debugfs: stat O/23c0000400
Inode: 1179656 Type: directory Mode: 0755 Flags: 0x80000
Generation: 2411782533 Version: 0x00000000:00000000
User: 0 Group: 0 Project: 0 Size: 4096
File ACL: 0
Links: 34 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x6101806b:306016bc -- Wed Jul 28 17:06:03 2021
atime: 0x6101806b:2d83aad8 -- Wed Jul 28 17:06:03 2021
mtime: 0x6101806b:306016bc -- Wed Jul 28 17:06:03 2021
crtime: 0x6101806b:2d83aad8 -- Wed Jul 28 17:06:03 2021
Size of extra inode fields: 32
Extended attributes:
lma: fid=[0x120008:0x8fc0e185:0x0] compat=c incompat=0
EXTENTS:
(0):33989
But then on a client:
lfs fid2path /snap8 [0x120008:0x8fc0e185:0x0]
lfs fid2path: cannot find '[0x120008:0x8fc0e185:0x0]': No such file or
directory
(and likewise for the others).
Not quite sure what you meant by the O/*/d* as there are no directories
within O/, and there is no d/ or d*/ either at top level or within O/
Running (on the OST):
lctl lfsck_start -M snap8-OST004e
seems to work (at least, doesn't return any error).
However, lctl lfsck_query -M snap8-OST004e gives:
Fail to query LFSCK: Inappropriate ioctl for device
Thanks,
Alastair.
On Sat, 4 Sep 2021, Andreas Dilger wrote:
> [EXTERNAL EMAIL]
>
> You could run debugfs on that OST and use "ls -l" to examine the O/*/d* directories for large objects, then "stat" any suspicious objects within debugfs to dump the parent FID, and "lfs fid2path" on a client to determine the path.
>
> Alternately, see "lctl-lfsck-start.8" man page for options to link orphan objects to the .lustre/lost+found directory if you think there are no files referencing those objects.
>
> Cheers, Andreas
>
>> On Sep 4, 2021, at 00:54, Alastair Basden <a.g.basden at durham.ac.uk> wrote:
>>
>> Ah, of course - has to be done on a client.
>>
>> None of these files are on the dodgy OST.
>>
>> Any further suggestions? Essentially we have what seems to be a full OST with nothing on it.
>>
>> Thanks,
>> Alastair.
>>
>>> On Sat, 4 Sep 2021, Andreas Dilger wrote:
>>>
>>> [EXTERNAL EMAIL]
>>> $ man lfs-fid2path.1
>>> lfs-fid2path(1) user utilities lfs-fid2path(1)
>>>
>>> NAME
>>> lfs fid2path - print the pathname(s) for a file identifier
>>>
>>> SYNOPSIS
>>> lfs fid2path [OPTION]... <FSNAME|MOUNT_POINT> <FID>...
>>>
>>> DESCRIPTION
>>> lfs fid2path maps a numeric Lustre File IDentifier (FID) to one or more pathnames
>>> that have hard links to that file. This allows resolving filenames for FIDs used in console
>>> error messages, and resolving all of the pathnames for a file that has multiple hard links.
>>> Pathnames are resolved relative to the MOUNT_POINT specified, or relative to the
>>> filesystem mount point if FSNAME is provided.
>>>
>>> OPTIONS
>>> -f, --print-fid
>>> Print the FID with the path.
>>>
>>> -c, --print-link
>>> Print the current link number with each pathname or parent directory.
>>>
>>> -l, --link=LINK
>>> If a file has multiple hard links, then print only the specified LINK, starting at link 0.
>>> If multiple FIDs are given, but only one pathname is needed for each file, use --link=0.
>>>
>>> EXAMPLES
>>> $ lfs fid2path /mnt/testfs [0x200000403:0x11f:0x0]
>>> /mnt/testfs/etc/hosts
>>>
>>>
>>> On Sep 3, 2021, at 14:51, Alastair Basden <a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk>> wrote:
>>>
>>> Hi,
>>>
>>> lctl get_param mdt.*.exports.*.open_files returns:
>>> mdt.snap8-MDT0000.exports.172.18.180.21 at o2ib.open_files=
>>> [0x20000b90e:0x10aa:0x0]
>>> mdt.snap8-MDT0000.exports.172.18.180.22 at o2ib.open_files=
>>> [0x20000b90e:0x21b3:0x0]
>>> mdt.snap8-MDT0000.exports.172.18.181.19 at o2ib.open_files=
>>> [0x20000b90e:0x21b3:0x0]
>>> [0x20000b90e:0x21b4:0x0]
>>> [0x20000b90c:0x1574:0x0]
>>> [0x20000b90c:0x1575:0x0]
>>> [0x20000b90c:0x1576:0x0]
>>>
>>> Doesn't seem to be many open, so I don't think it's a problem of open files.
>>>
>>> Not sure which bit of this I need to use with lfs fid2path either...
>>>
>>> Cheers,
>>> Alastair.
>>>
>>>
>>> On Fri, 3 Sep 2021, Andreas Dilger wrote:
>>>
>>> [EXTERNAL EMAIL]
>>> You can also check "mdt.*.exports.*.open_files" on the MDTs for a list of FIDs open on each client, and use "lfs fid2path" to resolve them to a pathname.
>>>
>>> On Sep 3, 2021, at 02:09, Degremont, Aurelien via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>> wrote:
>>>
>>> Hi
>>>
>>> It could be a bug, but most of the time, this is due to an open-unlinked file, typically a log file which is still in use and some processes keep writing to it until it fills the OSTs it is using.
>>>
>>> Look for such files on your clients (use lsof).
>>>
>>> Aurélien
>>>
>>>
>>> Le 03/09/2021 09:50, « lustre-discuss au nom de Alastair Basden » <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org><mailto:lustre-discuss-bounces at lists.lustre.org> au nom de a.g.basden at durham.ac.uk<mailto:a.g.basden at durham.ac.uk><mailto:a.g.basden at durham.ac.uk>> a écrit :
>>>
>>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>>
>>>
>>>
>>> Hi,
>>>
>>> We have a file system where each OST is a single SSD.
>>>
>>> One of those is reporting as 100% full (lfs df -h /snap8):
>>> snap8-OST004d_UUID 5.8T 2.0T 3.5T 37% /snap8[OST:77]
>>> snap8-OST004e_UUID 5.8T 5.5T 7.5G 100% /snap8[OST:78]
>>> snap8-OST004f_UUID 5.8T 2.0T 3.4T 38% /snap8[OST:79]
>>>
>>> However, I can't find any files on it:
>>> lfs find --ost snap8-OST004e /snap8/
>>> returns nothing.
>>>
>>> I guess that it has filled up, and that there is some bug or other that is
>>> now preventing proper behaviour - but I could be wrong.
>>>
>>> Does anyone have any suggestions?
>>>
>>> Essentially, I'd like to find some of the files and delete or migrate
>>> some, and thus return it to useful production.
>>>
>>> Cheers,
>>> Alastair.
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org><mailto:lustre-discuss at lists.lustre.org>
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Principal Architect
>>> Whamcloud
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Principal Architect
>>> Whamcloud
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
More information about the lustre-discuss
mailing list