[Lustre-discuss] non-consecutive OST ordering
Wang Yibin
wang.yibin at oracle.com
Fri Nov 12 00:17:35 PST 2010
This is a bug in llapi_lov_get_uuids() which assigns UUID to the wrong OST index when there are sparse OST(s).
Please file a bug for this.
Before this bug can be fixed, you can apply the following patch to e2fsprogs(version 1.41.12.2.ora1) lfsck.c as a workaround (not verified though).
--- e2fsprogs/e2fsck/lfsck.c 2010-11-12 11:43:42.000000000 +0800
+++ lfsck.c 2010-11-12 12:14:38.000000000 +0800
@@ -1226,6 +1226,12 @@
__u64 last_id;
int i, rc;
+ /* skip empty UUID OST */
+ if(!strlen(lfsck_uuid[ost_idx].uuid)) {
+ log_write("index %d UUID is empty(sparse OST index?). Skipping.\n", ost_idx);
+ return(0);
+ }
+
sprintf(dbname, "%s.%d", MDS_OSTDB, ost_idx);
VERBOSE(2, "testing ost_idx %d\n", ost_idx);
@@ -1279,11 +1284,20 @@
ost_hdr->ost_uuid.uuid);
if (obd_uuid_equals(&lfsck_uuid[ost_idx], &ost_hdr->ost_uuid)) {
+ /* must be sparse ost index */
if (ost_hdr->ost_index != ost_idx) {
log_write("Requested ost_idx %u doesn't match "
"index %u found in %s\n", ost_idx,
ost_hdr->ost_index, ost_files[i]);
- continue;
+
+ log_write("Moving the index/uuid to the right place...\n");
+ /* zero the original uuid entry */
+ memset(&lfsck_uuid[ost_idx], 0, sizeof(struct obd_uuid));
+ /* copy it to the right place */
+ ost_idx = ost_hdr->ost_index;
+ strcpy(lfsck_uuid[ost_hdr->ost_index].uuid,ost_hdr->ost_uuid.uuid);
+ /* skip this round */
+ goto out;
}
break;
在 2010-11-12,上午10:53, Christopher Walker 写道:
> Thanks very much for your reply. I've tried remaking the mdsdb and all
> of the ostdb's, but I still get the same error -- it checks the first 34
> osts without a problem, but can't find the ostdb file for the 35th
> (which has ost_idx 42):
>
> ...
> lfsck: ost_idx 34: pass3 OK (676803 files total)
> lfsck: can't find file for ost_idx 35
> Files affected by missing ost info are : -
> lfsck: can't find file for ost_idx 36
> Files affected by missing ost info are : -
> lfsck: can't find file for ost_idx 37
> Files affected by missing ost info are : -
> lfsck: can't find file for ost_idx 38
> Files affected by missing ost info are : -
> lfsck: can't find file for ost_idx 39
> Files affected by missing ost info are : -
> lfsck: can't find file for ost_idx 40
> Files affected by missing ost info are : -
> lfsck: can't find file for ost_idx 41
> Files affected by missing ost info are : -
> lfsck: can't find file for ost_idx 42
> Files affected by missing ost info are : -
> ...
>
> e2fsck claims to be making the ostdb without a problem:
>
> Pass 6: Acquiring information for lfsck
> OST: 'aegalfs-OST002a_UUID' ost idx 42: compat 0x2 rocomp 0 incomp 0x2
> OST: num files = 676803
> OST: last_id = 858163
>
> and with the filesystem up I can see files on this OST:
>
> [cwalker at iliadaccess04 P-Gadget3.3.1]$ lfs getstripe predict.c
> OBDS:
> 0: aegalfs-OST0000_UUID ACTIVE
> ...
> 33: aegalfs-OST0021_UUID ACTIVE
> 42: aegalfs-OST002a_UUID ACTIVE
> predict.c
> obdidx objid objid group
> 42 10 0xa 0
>
>
> lfsck identifies several hundred GB of orphan data that we'd like to
> recover, so we'd really like to run lfsck on this array. We're willing
> to forgo the recovery on the 35th ost, but I want to make sure that
> running lfsck -l with the current configuration won't make things worse.
>
> Thanks again for your reply; any further advice is very much appreciated!
>
> Best,
> Chris
>
> On 11/10/10 12:10 AM, Wang Yibin wrote:
>> The error message indicates that the UUID of OST #35 does not match between the live filesystem and the ostdb file.
>> Is this ostdb obsolete?
>>
>> 在 2010-11-9,下午11:45, Christopher Walker 写道:
>>
>>>
>>> For reasons that I can't recall, our OSTs are not in consecutive order
>>> -- we have 35 OSTs, which are numbered consecutively from
>>> 0000-0021
>>> and then there's one last OST at
>>> 002a
>>>
>>> When I try to run lfsck on this array, it works fine for the first 34
>>> OSTs, but it can't seem to find the last OST db file:
>>>
>>> lfsck: ost_idx 34: pass3 OK (680045 files total)
>>> lfsck: can't find file for ost_idx 35
>>> Files affected by missing ost info are : -
>>> lfsck: can't find file for ost_idx 36
>>> Files affected by missing ost info are : -
>>> lfsck: can't find file for ost_idx 37
>>> Files affected by missing ost info are : -
>>> lfsck: can't find file for ost_idx 38
>>> Files affected by missing ost info are : -
>>> lfsck: can't find file for ost_idx 39
>>> Files affected by missing ost info are : -
>>> lfsck: can't find file for ost_idx 40
>>> Files affected by missing ost info are : -
>>> lfsck: can't find file for ost_idx 41
>>> Files affected by missing ost info are : -
>>> lfsck: can't find file for ost_idx 42
>>> Files affected by missing ost info are : -
>>> /n/scratch/hernquist_lab/tcox/tests/SbSbhs_e_8/P-Gadget3.3.1/IdlSubfind/.svn/text-base/ReadSubhaloFromReshuffledSnapshot.pro.svn-base
>>>
>>> and then lists all of the files that live on OST 002a. This db file
>>> definitely does exist -- it lives in the same directory as all of the
>>> other db files, and e2fsck for this OST ran without problems.
>>>
>>> Is there some way of forcing lfsck to recognize this OST db? Or,
>>> failing that, is it dangerous to run lfsck on the first 34 OSTs only?
>>>
>>> We're using e2fsck 1.41.6.sun1 (30-May-2009)
>>>
>>> Thanks very much!
>>>
>>> Chris
>>>
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list