[Lustre-discuss] Problem replacing an OST in 1.6.7

Andreas Dilger adilger at sun.com
Tue Mar 3 16:29:26 PST 2009


On Mar 03, 2009  17:15 -0600, Nirmal Seenu wrote:
> mkfs.lustre --fsname=lqcdproj --ost --mgsnode=iblustre1 at tcp1 
> --mkfsoptions="-m 0" --index=0000 --reformat /dev/md2
> 
> I received these error messages when I tried to mount it for the first time:
> 
> Mar  3 16:19:53 lustre1 kernel: Lustre: OST lqcdproj-OST0000 now serving 
> dev (lqcdproj-OST0000/a968f0cc-a66b-bbf7-458f-9b8759c60ef5) with 
> recovery enabled

So, the new OST has started up after being reformatted.

> Mar  3 16:19:56 lustre1 kernel: Lustre: MDS lqcdproj-MDT0000: 
> lqcdproj-OST0000_UUID now active, resetting orphans

Here, the MDS (which doesn't know that the OST was reformatted)
is trying to recreate the objects that are missing from the OST
(this might be several millions, because it doesn't know you
reformatted the filesystem).

> Mar  3 16:19:58 lustre1 kernel: LustreError: 
> 6359:0:(filter.c:3138:filter_precreate()) create failed rc = -28

Here, the OST has run out of inodes, because it was trying to
create some millions of objects.


This is probably a situation that Lustre could handle more gracefully,
by just refusing to recreate those missing objects if the count is too
high and accept the MDS's word for it that those objects were previously
used.  It isn't ideal, since the number of times an OST is reformatted
like this is very small.

Can you please file a bug at bugzilla.lustre.org with the detailed
procedure you followed.

In the meantime I suggest you just format your new OST and add it
without specifying an OST index, and permanently mark OST0000 inactive
(steps to do so were recently discussed on the list).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list