[lustre-discuss] No space left on device MDT DoM but not full nor run out of inodes

Jon Marshall Jon.Marshall at cruk.cam.ac.uk
Tue Jun 20 08:32:36 PDT 2023


Sorry, typo in the version number - the version we are actually running is 2.12.6
________________________________
From: Jon Marshall
Sent: 20 June 2023 16:18
To: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: No space left on device MDT DoM but not full nor run out of inodes

Hi,

We've been running lustre 2.15.1 in production for over a year and recently decided to enable PFL with DoM on our filesystem. Things have been fine up until last week, when users started reporting issues copying files, specifically "No space left on device". The MDT is running ldiskfs as the backend.

I've searched through the mailing list and found a couple of people reporting similar problems, which prompted me to check the inode allocation, which is currently:

UUID                      Inodes       IUsed       IFree IUse% Mounted on
scratchc-MDT0000_UUID   624492544    71144384   553348160  12% /mnt/scratchc[MDT:0]
scratchc-OST0000_UUID    57712579    24489934    33222645  43% /mnt/scratchc[OST:0]
scratchc-OST0001_UUID    57114064    24505876    32608188  43% /mnt/scratchc[OST:1]

filesystem_summary:    136975217    71144384    65830833  52% /mnt/scratchc

So, nowhere near full - the disk usage is a little higher:

UUID                       bytes        Used   Available Use% Mounted on
scratchc-MDT0000_UUID      882.1G      451.9G      355.8G  56% /mnt/scratchc[MDT:0]
scratchc-OST0000_UUID       53.6T       22.7T       31.0T  43% /mnt/scratchc[OST:0]
scratchc-OST0001_UUID       53.6T       23.0T       30.6T  43% /mnt/scratchc[OST:1]

filesystem_summary:       107.3T       45.7T       61.6T  43% /mnt/scratchc

But not full either! The errors are accompanied in the logs by:

LustreError: 15450:0:(tgt_grant.c:463:tgt_grant_space_left()) scratchc-MDT0000: cli ba0195c7-1ab4-4f7c-9e28-8689478f5c17/ffff9e331e231c00 left 82586337280 < tot_grant 82586681321 unstable 0 pending 0 dirty 1044480
LustreError: 15450:0:(tgt_grant.c:463:tgt_grant_space_left()) Skipped 33050 previous similar messages

For reference the DoM striping we're using is:

  lcm_layout_gen:    0
  lcm_mirror_count:  1
  lcm_entry_count:   3
    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 0
    lcme_extent.e_end:   1048576
      stripe_count:  0       stripe_size:   1048576       pattern:       mdt       stripe_offset: -1

    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 1048576
    lcme_extent.e_end:   1073741824
      stripe_count:  1       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1

    lcme_id:             N/A
    lcme_mirror_id:      N/A
    lcme_flags:          0
    lcme_extent.e_start: 1073741824
    lcme_extent.e_end:   EOF
      stripe_count:  -1       stripe_size:   1048576       pattern:       raid0       stripe_offset: -1

So the first 1MB on the MDT.

My question is obviously what is causing these errors? I'm not massively familiar with Lustre internals, so any pointers on where to look would be greatly appreciated!

Cheers
Jon


Jon Marshall

High Performance Computing Specialist



IT and Scientific Computing Team



Cancer Research UK Cambridge Institute

Li Ka Shing Centre | Robinson Way | Cambridge | CB2 0RE

Web<http://www.cruk.cam.ac.uk/> | Facebook<http://www.facebook.com/cancerresearchuk> | Twitter<http://twitter.com/CR_UK>



[Description: CRI Logo]<http://www.cruk.cam.ac.uk/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230620/0b6ea0d3/attachment-0001.htm>


More information about the lustre-discuss mailing list