[lustre-discuss] Issues with 2.10 upgrade and files missing LMAC_FID_ON_OST flag

Fri Aug 4 12:22:42 PDT 2017

Hello,

Last weekend, we've upgraded our lustre from 2.7 to 2.10. After the upgrade, we were missing about 36M objects. After a bunch of troubleshooting, we ended up running e2fsck (which recovered the objects to lost+found) and ll_recover_lost_found_objs (which moved them back to the proper place in the ldiskfs filesystem). It's worth noting that lfsck couldn't recover the objects from lost+found (because of some kind of incompatibility between the objects EA and lfsck, details following).

Couple of remarks:
1. it looks like the functionality from ll_recover_lost_found_objs has been moved to the oi_scrub initial process but it would not work for us. We noticed 2 things:
   a. in the lab, we recreated the issue (by moving manually objects to lost+found) and osd_initial_OI_scrub() would recover only the first 255 objects. We couldn't figure out why it stopped at 255 and restarting the OST would not recover any more than the initial 255.
   b. in prod, osd_initial_OI_scrub() would run but not fix anything. The trace would come back with osd_ios_lf_fill() returning -EINVAL. After troubleshooting this issue more, it turns out all the objects in lost+found do have no compat flag (in particular no LMAC_FID_ON_OST) in the LMA extended attribute and eventually we end up in with osd_get_idif() returning -EINVAL (because __ost_xattr_get() returned 24). We believe all those files were created with lustre 2.7.
   -> this is how far we got troubleshooting those 2 issues. Sounds like bugs, we are happy to give more details and/or file a bug report if that helps.
2. our lustre has 96 OST (id 0 to 97). All of the bad objects were located on 24 of them (id 48 to 71) -- about 1.5M bad inodes out of 3M per OST. What's special about id 48 to 71, is that those OSTs have been reformatted about 6 months ago (with the same id, but at creation we forgot to add --replace to mkfs or do a writeconf). At the time, we saw some "precreate FID 0x0:3164581 is over 100000 larger than the LAST_ID 0x0:0, only precreating the last 10000 objects." in the logs. This sounds like the potential root cause to our issue last week, but we really can't figure out how this would have caused half of inodes to not get LMAC_FID_ON_OST and get lost in ldiskfs.
3. after fixing everything, when we run the lfsck -t scrub, all the bad objects are being checked and reported as failed in oi_scrub (example below). After digging, this comes down to the same ost_get_idif() function returning -EINVAL. We can fix this by copying files.
checked: 3383278
updated: 0
failed: 1469776

Overall, we just wanted to report this on the mailing list in case someone else runs into this issue and see if we should open bugs about 1.a. and 1.b. And also, we were curious whether anybody had any explanation on how we got there and whether 2. could explain it.

Regards,
Julien

________________________________

IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.