[lustre-discuss] unable to precreate -52/-116
woodystrash at hotmail.com
Mon Apr 12 15:50:07 PDT 2021
Just going to piggy back on this thread as we are experiencing exactly the same thing. We're ldiskfs, though, not zfs. We were 2.10.3 on all servers and, when this first occurred, it brought down our MDT with the following ASSERT:
Apr 10 01:22:09 hpcmds01.adqimr.ad.lan kernel: LustreError: 183448:0:(osp_precreate.c:634:osp_precreate_send()) qimrb-OST008f-osc-MDT0000: precreate fid [0x1008f0000:0xa65bd3f:0x0] < local used fid [0x1008f0000:0xa65bd3f:0x0]:
Apr 10 01:22:09 hpcmds01.adqimr.ad.lan kernel: LustreError: 57275:0:(osp_precreate.c:1311:osp_precreate_ready_condition()) qimrb-OST008f-osc-MDT0000: precreate failed opd_pre_status -116
Apr 10 01:22:09 hpcmds01.adqimr.ad.lan kernel: LustreError: 183448:0:(osp_precreate.c:1259:osp_precreate_thread()) qimrb-OST008f-osc-MDT0000: cannot precreate objects: rc = -116
Apr 10 01:22:09 hpcmds01.adqimr.ad.lan kernel: LustreError: 8475:0:(lod_qos.c:1624:lod_alloc_qos()) ASSERTION( nfound <= inuse->op_count ) failed: nfound:19, op_count:0
Apr 10 01:22:09 hpcmds01.adqimr.ad.lan kernel: LustreError: 8475:0:(lod_qos.c:1624:lod_alloc_qos()) LBUG
Attempts to remount the MDT resulted in repeated crashes. Thinking it was https://jira.whamcloud.com/browse/LU-10297, we brought the MDS up to 2.10.4 and were immediately bit by https://jira.whamcloud.com/browse/LU-11227, as we have deactivated OSTs so we quickly upgraded MDS/MGS to 2.10.6. We're now still seeing the -52/-116 and we have three OSTs that we similarly can't create objects on with explicit "lfs setstripe -i". OSSs are still on 2.10.3.
Not sure if I should reply to Marco's request for the various node list, "lfs df" and "getparam"s here, or open up a jira ticket. Leaning towards the latter but I'll spend some time in jira today to ensure it's not a duplicate, first.
We're currently up and running again but are looking to resolve the remaining unusable OSTs. And, like Amit, we're working towards a 2.12 upgrade in the near future but we just haven't got there yet.
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Marco Grossi <marco.grossi at ichec.ie>
Sent: Tuesday, 10 March 2020 9:22 PM
To: Kumar, Amit <ahkumar at mail.smu.edu>
Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: Re: [lustre-discuss] unable to precreate -52/-116
Sounds definitely different from my case.
The only JIRA issue logging a "precreate fid < local used fid" is:
What puzzle me is the "rc = -52" on the "ofd_create_hdl"; if I mapped it
correctly, is a -EBADE error, i.e. "invalid exchange".
Can you provide:
- HA node list and location of MGS, MDT and OST between nodes
As well as the output of:
- lfs df
- lfs df -i
- lctl get_param osp.*scratch0-OST0029*.prealloc*
- lctl get_param obdfilter.*scratch0-OST0029*.last_id
On 3/9/20 5:23 PM, Kumar, Amit wrote:
> Hi Marco,
> Thank you for the response on this issue.
> We have an HA setup, I tried to fail over MDT to the secondary pair and then fail it back. This did not help.
> I also tried restart of the MDS servers, that did not help.
> I have rebooted OSS servers as well, that did not help
> I also tried completely stopping MDS and unmounting MDS for a little while and that did not help either.
> This error ritually comes back right after MDT is mounted. Additionally I am not able to manually create any files on that particular OST. Any other thoughts.
> Thank you,
> -----Original Message-----
> From: Marco Grossi <marco.grossi at ichec.ie>
> Sent: Monday, March 9, 2020 11:23 AM
> To: Kumar, Amit <ahkumar at mail.smu.edu>
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [lustre-discuss] unable to precreate -52/-116
> Hi Amit,
> We had a similar issue after a set_param of "max_create_count=0"
> In our case re-mounting the MDT (not the OST) fixed the issue.
> Hope it helps.
> On 3/3/20 8:25 PM, Kumar, Amit wrote:
>> Dear Lustre,
>> Recently we had a degraded(Not failed) RAID and had to wait longer to
>> get compatible disk, as we had received incompatible one and it took
>> over a week to get the correct one back in place.
>> During this wait I ended up disabling the OST first and then noticed
>> continuous IO to the OST and thought of disabling object creation on
>> it as well. Everything looked normal after that and once the disk was
>> replaced I reenabled object creation and enabled OST. Since then I
>> started seeing these messages on OST
>> .(ofd_dev.c:1784:ofd_create_hdl()) scratch0-OST0029: unable to
>> precreate: rc = -52
>> And following messages on MDS
>> scratch0-OST0029-osc-MDT0000: cannot precreate objects: rc = -116
>> scratch0-OST0029-osc-MDT0000: precreate fid
>> [0x100290000:0x101b39a:0x0] < local used fid
>> [0x100290000:0x101b39a:0x0]: rc = -116
>> These messages don't seem to stop. I am wondering what impact could
>> these errors have in long run? I have noticed I am not able to create
>> files on this particular OST using lfs setstripe, when I do so it gets
>> me an object on another OST by default. Just want to make sure this is
>> not causing any data loss for files the currently on them and new requests?
>> We plan to upgrade to 2.12 in the summer downtime and assuming that
>> has a fix based on LU-9442 & LU-11186. Currently running servers on
>> 10.4.1 over ZFS-0.7.9-1
>> Any help is greatly appreciated.
>> Thank you,
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
> Marco Grossi
> ICHEC Systems Team
> ----IF CLASSIFICATION START----
> ----IF CLASSIFICATION END----
ICHEC Systems Team
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the lustre-discuss