[lustre-discuss] Full OST after reintroduction

Jon Marshall Jon.Marshall at cruk.cam.ac.uk
Mon Jun 10 05:56:03 PDT 2024


Hi,

We had an issue a few months ago with the underlying zpool for one of our OSTs. I managed to get it mounted in read only mode and migrated all of the files off it with lfs migrate, then recreated the OST and reintroduced it. This all went pretty smoothly - at the same time I updated our progressive file layout using the following command:

lfs find . -type d -print0 | xargs -0 lfs setstripe -E 256M -c 1 -E eof -c -1

I then ran an lfs find to find all the files bigger than 256M and migrated them to this new layout.

I have since noticed that the OST that was reintroduced has been filling up more rapidly than the others, to the point where it is now full:

UUID                       bytes        Used   Available Use% Mounted on
scratchc-MDT0000_UUID        1.4T      108.0G        1.3T   8% /mnt/scratchc[MDT:0]
scratchc-OST0000_UUID       55.2T       55.2T       42.0M 100% /mnt/scratchc[OST:0]
scratchc-OST0001_UUID       55.2T       22.5T       32.7T  41% /mnt/scratchc[OST:1]
scratchc-OST0002_UUID       46.0T       19.3T       26.7T  43% /mnt/scratchc[OST:2]
scratchc-OST0003_UUID       46.0T       19.4T       26.6T  43% /mnt/scratchc[OST:3]
scratchc-OST0004_UUID       46.0T       19.5T       26.5T  43% /mnt/scratchc[OST:4]
scratchc-OST0005_UUID       55.2T       22.8T       32.5T  42% /mnt/scratchc[OST:5]

filesystem_summary:       303.8T      158.8T      145.0T  53% /mnt/scratchc

For reference, I marked the OST as inactive to migrate the files off by using the command:

lctl set_param osp.scratchc-OST0000-osc-MDT0000.max_create_count=0

As per the manual. To reactivate it after having rebuilt it, I copied the count from the other OSTs:

~]# lctl get_param osp.scratchc-*.max_create_count
osp.scratchc-OST0000-osc-MDT0000.max_create_count=20000
osp.scratchc-OST0001-osc-MDT0000.max_create_count=20000
osp.scratchc-OST0002-osc-MDT0000.max_create_count=20000
osp.scratchc-OST0003-osc-MDT0000.max_create_count=20000
osp.scratchc-OST0004-osc-MDT0000.max_create_count=20000
osp.scratchc-OST0005-osc-MDT0000.max_create_count=20000

As far as I can tell I haven't told lustre to preferentially use the one OST, so I'm a little stumped as to why this has happened - it is possible that someone has changed the default layout on some of their folders but I'm struggling to think of a quick way of checking this.

Has anyone else run into similar problems? I'm hoping there is something incredibly obvious that I've missed somewhere!

Thanks in advance!


Jon Marshall

High Performance Computing Specialist



IT and Scientific Computing Team



Cancer Research UK Cambridge Institute

Li Ka Shing Centre | Robinson Way | Cambridge | CB2 0RE

Web<http://www.cruk.cam.ac.uk/> | Facebook<http://www.facebook.com/cancerresearchuk> | Twitter<http://twitter.com/CR_UK>



[Description: CRI Logo]<http://www.cruk.cam.ac.uk/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240610/fb6fb7d1/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-Descriptio.png
Type: image/png
Size: 22068 bytes
Desc: Outlook-Descriptio.png
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240610/fb6fb7d1/attachment-0001.png>


More information about the lustre-discuss mailing list