[lustre-discuss] No space left on device MDT DoM but not full nor run out of inodes
Jon Marshall
Jon.Marshall at cruk.cam.ac.uk
Tue Jun 20 08:32:36 PDT 2023
Sorry, typo in the version number - the version we are actually running is 2.12.6
________________________________
From: Jon Marshall
Sent: 20 June 2023 16:18
To: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: No space left on device MDT DoM but not full nor run out of inodes
Hi,
We've been running lustre 2.15.1 in production for over a year and recently decided to enable PFL with DoM on our filesystem. Things have been fine up until last week, when users started reporting issues copying files, specifically "No space left on device". The MDT is running ldiskfs as the backend.
I've searched through the mailing list and found a couple of people reporting similar problems, which prompted me to check the inode allocation, which is currently:
UUID Inodes IUsed IFree IUse% Mounted on
scratchc-MDT0000_UUID 624492544 71144384 553348160 12% /mnt/scratchc[MDT:0]
scratchc-OST0000_UUID 57712579 24489934 33222645 43% /mnt/scratchc[OST:0]
scratchc-OST0001_UUID 57114064 24505876 32608188 43% /mnt/scratchc[OST:1]
filesystem_summary: 136975217 71144384 65830833 52% /mnt/scratchc
So, nowhere near full - the disk usage is a little higher:
UUID bytes Used Available Use% Mounted on
scratchc-MDT0000_UUID 882.1G 451.9G 355.8G 56% /mnt/scratchc[MDT:0]
scratchc-OST0000_UUID 53.6T 22.7T 31.0T 43% /mnt/scratchc[OST:0]
scratchc-OST0001_UUID 53.6T 23.0T 30.6T 43% /mnt/scratchc[OST:1]
filesystem_summary: 107.3T 45.7T 61.6T 43% /mnt/scratchc
But not full either! The errors are accompanied in the logs by:
LustreError: 15450:0:(tgt_grant.c:463:tgt_grant_space_left()) scratchc-MDT0000: cli ba0195c7-1ab4-4f7c-9e28-8689478f5c17/ffff9e331e231c00 left 82586337280 < tot_grant 82586681321 unstable 0 pending 0 dirty 1044480
LustreError: 15450:0:(tgt_grant.c:463:tgt_grant_space_left()) Skipped 33050 previous similar messages
For reference the DoM striping we're using is:
lcm_layout_gen: 0
lcm_mirror_count: 1
lcm_entry_count: 3
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
stripe_count: 0 stripe_size: 1048576 pattern: mdt stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 1048576
lcme_extent.e_end: 1073741824
stripe_count: 1 stripe_size: 1048576 pattern: raid0 stripe_offset: -1
lcme_id: N/A
lcme_mirror_id: N/A
lcme_flags: 0
lcme_extent.e_start: 1073741824
lcme_extent.e_end: EOF
stripe_count: -1 stripe_size: 1048576 pattern: raid0 stripe_offset: -1
So the first 1MB on the MDT.
My question is obviously what is causing these errors? I'm not massively familiar with Lustre internals, so any pointers on where to look would be greatly appreciated!
Cheers
Jon
Jon Marshall
High Performance Computing Specialist
IT and Scientific Computing Team
Cancer Research UK Cambridge Institute
Li Ka Shing Centre | Robinson Way | Cambridge | CB2 0RE
Web<http://www.cruk.cam.ac.uk/> | Facebook<http://www.facebook.com/cancerresearchuk> | Twitter<http://twitter.com/CR_UK>
[Description: CRI Logo]<http://www.cruk.cam.ac.uk/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230620/0b6ea0d3/attachment-0001.htm>
More information about the lustre-discuss
mailing list