[Lustre-discuss] recovering formatted OST

Tue Oct 26 12:00:50 PDT 2010

On 26 October 2010 19:55, Wojciech Turek <wjt27 at cam.ac.uk> wrote:

> In that case LAST_ID seem to be fine as OST show 2490599 and MDT shows
> 2490688 so the difference is 89, I don't understand why you said that
> difference is over 100000
>
>
>  [root at oss09 ~]# od -Ax -td8 /tmp/LAST_ID
>   000000              2490599
>   000008
>
>  [root at mds03 ~]# od -Ax -td8 /tmp/lov_objid
>   000000              2073842              2100049
>   000010              2115247              2038471
>   000020              2119821              2190996
>   000030              2029234              2354424
>   000040              2160856              2167105
>   000050              1970351              2059045
>   000060              2706486              2571655
>   000070              2662262              2628346
>   000080              2490688              2668926
>   000090              2631587              2643791
>   0000a0
>
> What I don't understand is why lctl reports last_id=1 for that OST
>
> lctl get_param osc.*.prealloc_last_id | grep OST0010
> osc.scratch2-OST0010-osc.prealloc_last_id=1
>

Unless this is because that OST is deactivated on the MDT ?

>
> On 26 October 2010 19:49, Bernd Schubert <bs_lists at aakef.fastmail.fm>wrote:
>
>> That is the value in the lov_objid.
>>
>> Cheers,
>> Bernd
>>
>> On Tuesday, October 26, 2010, Wojciech Turek wrote:
>> > I can not find where MDT stores that LAST_ID value for the OST?
>> >
>> > On 26 October 2010 19:10, Bernd Schubert <bs_lists at aakef.fastmail.fm>
>> wrote:
>> > > I think the difference is quite huge (over 100000 files). But the MDS
>> has
>> > > a sanity check and will refuse to activate this OST, if the difference
>> > > is larger
>> > > than 20000 files.
>> > >
>> > > So one way or the other you need to correct it (either increase
>> LAST_ID
>> > > value
>> > > on the OST or on the MDS).
>> > >
>> > >
>> > > Cheers,
>> > > Bernd
>> > >
>> > > On Tuesday, October 26, 2010, Wojciech Turek wrote:
>> > > > Ok, I have created a filesystem on a loopback device. I mounted it
>> as
>> > > > ldiskfs and copied CONFIGS directory back to my old OST. Now
>> > >
>> > > tunefs.lustre
>> > >
>> > > > returns correct info.
>> > > >
>> > > > last_id on OST is smaller then number in MDT lov_objid which is good
>> > > >
>> > > > Can ignore that lctl get_param osc.*.prealloc_last_id | grep OST0010
>> > > > osc.scratch2-OST0010-osc.prealloc_last_id=1
>> > > >
>> > > > I guess when I restart whole filesystem after writeconf MDT should
>> > >
>> > > correct
>> > >
>> > > > that?
>> > > >
>> > > > best regards,
>> > > >
>> > > > Wojciech
>> > > >
>> > > > On 26 October 2010 18:05, Bernd Schubert <
>> bs_lists at aakef.fastmail.fm>
>> > >
>> > > wrote:
>> > > > > Hello Wojciech,
>> > > > >
>> > > > > tunefs.lustre has to complain as the files are missing. If you
>> copy
>> > >
>> > > over
>> > >
>> > > > > the
>> > > > > files from the loop back device (yes, same index and label),
>> > > > > tunefs.lustre should work.
>> > > > >
>> > > > > Cheers,
>> > > > > Bernd
>> > > > >
>> > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote:
>> > > > > > Hi Bernd,
>> > > > > >
>> > > > > > I am not quite clear how creating new OST on a loopback device
>> > > > > > would
>> > > > >
>> > > > > help:
>> > > > > > Shall I create new OST on a loopback device formatting it with
>> old
>> > > > > > index and label and then copy recovered objects to that OST and
>> > > > > > mount it to the filesystem?
>> > > > > >
>> > > > > > I think I need to reformat old OST before mounting it as lustre
>> > > > > > type filesystem as although fsck recovered some objects (and I
>> can
>> > > > > > access them mounting OST as ldiskfs)  if you run tunefs.lustre
>> on
>> > > > > > that OST device, tunefs.lustre complaints that it doesn't find
>> any
>> > > > > > lustre filesystem.
>> > > > > >
>> > > > > > As for the EAs I have created a backup of the recovered objects
>> > > > >
>> > > > > preserving
>> > > > >
>> > > > > > EAs.
>> > > > > >
>> > > > > > Best regards,
>> > > > > >
>> > > > > > Wojciech
>> > > > > >
>> > > > > > On 26 October 2010 16:35, Bernd Schubert
>> > > > > > <bernd.schubert at fastmail.fm
>> > > > >
>> > > > > wrote:
>> > > > > > > Hello Wojciech,
>> > > > > > >
>> > > > > > > I think both would work, but why don't just create a small OST
>> > > > > > > with mkfs.lustre on a loopback device? And then copy over
>> those
>> > > > > > > files to
>> > > > >
>> > > > > your
>> > > > >
>> > > > > > > recovered filesystem.
>> > > > > > > Hmm, well, e2fsck might not have fixed all issues and then a
>> > >
>> > > reformat
>> > >
>> > > > > > > indeed
>> > > > > > > might be helpful.
>> > > > > > >
>> > > > > > > Also note: EAs on OST objects are a nice to have, but not
>> > >
>> > > absolutely
>> > >
>> > > > > > > required.
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Bernd
>> > > > > > >
>> > > > > > > On Tuesday, October 26, 2010, Wojciech Turek wrote:
>> > > > > > > > Bernd, I would like to clarify if I understood you
>> suggestion
>> > > > > > > > correctly:
>> > > > > > > >
>> > > > > > > > 1) create a new OST but using old index and old label
>> > > > > > > > 2) mount it as ldiskfs and copy recovered objects (using tar
>> or
>> > > > > > > > rsync
>> > > > > > >
>> > > > > > > with
>> > > > > > >
>> > > > > > > > xattrs support) from the old OST to the new OST
>> > > > > > > > 3) run --writeconf on MDT and OST of that filesystem
>> > > > > > > > 4) mount MDT and all OSTs
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > I guess I could do it also that way:
>> > > > > > > >
>> > > > > > > > 1) backup restored object using tar or rsync with xattrs
>> > > > > > > > support 2) format old OST with old index and old label
>> > > > > > > > 3) restore Objects from the backup
>> > > > > > > >
>> > > > > > > > Do you think that would work?
>> > > > > > > >
>> > > > > > > > Best regards,
>> > > > > > > >
>> > > > > > > > Wojciech
>> > > > > > > >
>> > > > > > > > On 22 October 2010 18:52, Bernd Schubert
>> > > > > > > > <bernd.schubert at fastmail.fm
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > > > > Hmm, I would probably format a small fake device on a
>> ramdisk
>> > >
>> > > and
>> > >
>> > > > > > > > > copy files
>> > > > > > > > > over, run tunefs --writeconf /mdt and then start
>> everything
>> > > > > > > > > (inlcuding all OSTs) again.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Cheers,
>> > > > > > > > >
>> > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote:
>> > > > > > > > > > I have tried Bernd's suggestion and it seem to have
>> worked,
>> > > > > > > > > > after running e2fsck -D ll_recover_lost_found_objs
>> didn't
>> > > > > > > > > > cause kernel
>> > > > > > >
>> > > > > > > panic
>> > > > > > >
>> > > > > > > > > > but moved
>> > > > > > > > >
>> > > > > > > > > a
>> > > > > > > > >
>> > > > > > > > > > number of objects to O directory. Problem is that I do
>> not
>> > >
>> > > have
>> > >
>> > > > > > > > > > last_rcvd file so the OST has no index at the moment.
>> What
>> > > > > > > > > > would
>> > > > >
>> > > > > be
>> > > > >
>> > > > > > > > > > the next step
>> > > > > > > > >
>> > > > > > > > > to
>> > > > > > > > >
>> > > > > > > > > > enable access to those files in the filesystem?
>> > > > > > > > > >
>> > > > > > > > > > Best regards,
>> > > > > > > > > >
>> > > > > > > > > > Wojciech
>> > > > > > > > > >
>> > > > > > > > > > On 22 October 2010 17:15, Andreas Dilger
>> > > > > > > > > > <andreas.dilger at oracle.com>
>> > > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > > > On 2010-10-22, at 5:42, Bernd Schubert
>> > > > > > > > > > > <bernd.schubert at fastmail.fm
>> > > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > > > > Hmm, e2fsck didn't catch that? rec_len is the length
>> of
>> > > > > > > > > > > > a
>> > > > > > >
>> > > > > > > directory
>> > > > > > >
>> > > > > > > > > > > entry, so
>> > > > > > > > > > >
>> > > > > > > > > > > > after how many bytes the next entry follows.
>> > > > > > > > > > >
>> > > > > > > > > > > I agree that e2fsck should have caught that.
>> > > > > > > > > > >
>> > > > > > > > > > > > You can try to force e2fsck to do
>> > > > > > > > > > > > something about that: e2fsck -D
>> > > > > > > > > > >
>> > > > > > > > > > > No, I would recommend against using -D at this point.
>> > > > > > > > > > > That will
>> > > > > > >
>> > > > > > > cause
>> > > > > > >
>> > > > > > > > > it
>> > > > > > > > >
>> > > > > > > > > > > to re-write the directory contents, and given that the
>> > > > >
>> > > > > filesystem
>> > > > >
>> > > > > > > was
>> > > > > > >
>> > > > > > > > > > > previously corrupted I would prefer making as few
>> changes
>> > >
>> > > as
>> > >
>> > > > > > > possible
>> > > > > > >
>> > > > > > > > > > > before the data is estranged.
>> > > > > > > > > > >
>> > > > > > > > > > > Wojciech,
>> > > > > > > > > > > note that if you are able to mount the filesystem you
>> > > > > > > > > > > could
>> > > > >
>> > > > > just
>> > > > >
>> > > > > > > copy
>> > > > > > >
>> > > > > > > > > all
>> > > > > > > > >
>> > > > > > > > > > > of the objects (with xattrs!) from lost+found on the
>> bad
>> > > > > > >
>> > > > > > > filesystem,
>> > > > > > >
>> > > > > > > > > > > along with the last_rcvd file (if you can find it)
>> into a
>> > >
>> > > new
>> > >
>> > > > > > > ldiskfs
>> > > > > > >
>> > > > > > > > > > > filesystem and then run ll_recover_lost_found_objs on
>> > > > > > > > > > > that.
>> > > > > > > > > > >
>> > > > > > > > > > > > On Friday, October 22, 2010, Wojciech Turek wrote:
>> > > > > > > > > > > >> Ok, removing and recreating the journal fixed that
>> > >
>> > > problem
>> > >
>> > > > > and
>> > > > >
>> > > > > > > > > > > >> I am able
>> > > > > > > > > > >
>> > > > > > > > > > > to
>> > > > > > > > > > >
>> > > > > > > > > > > >> mount device as ldiskfs filesystem. Now I hit
>> another
>> > >
>> > > wall
>> > >
>> > > > > > > > > > > >> when
>> > > > > > > > >
>> > > > > > > > > trying
>> > > > > > > > >
>> > > > > > > > > > > to
>> > > > > > > > > > >
>> > > > > > > > > > > >> run ll_recover_lost_found_objs
>> > > > > > > > > > > >> When I first time run ll_recover_lost_found_objs -d
>> > > > > > > > > > > >> /mnt/ost/lost+found
>> > > > > > > > > > >
>> > > > > > > > > > > it
>> > > > > > > > > > >
>> > > > > > > > > > > >> only creates the O dir and exits. When I repeat
>> this
>> > > > > > > > > > > >> command
>> > > > > > >
>> > > > > > > again
>> > > > > > >
>> > > > > > > > > > > kernel
>> > > > > > > > > > >
>> > > > > > > > > > > >> panics. Any idea what could be the problem here?
>> > > > > > > > > > > >>
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> LDISKFS-fs error (device dm-4): ldiskfs_readdir:
>> bad
>> > >
>> > > entry
>> > >
>> > > > > in
>> > > > >
>> > > > > > > > > > > >> directory #6831: rec_len is smaller than minimal -
>> > > > > > > > > > > >> offset=0,
>> > > > > > > > >
>> > > > > > > > > inode=0,
>> > > > > > > > >
>> > > > > > > > > > > >> rec_len=0, name_len=0
>> > > > > > > > > > > >> Aborting journal on device dm-4.
>> > > > > > > > > > > >> Unable to handle kernel NULL pointer dereference at
>> > > > > > > > > > > >> 0000000000000000
>> > > > > > > > > > >
>> > > > > > > > > > > RIP:
>> > > > > > > > > > > >> [<ffffffff88033448>]
>> > > > > > > :
>> > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db
>> > > > > > > :
>> > > > > > > > > > > >> PGD 1a118d067 PUD 1ce7e7067 PMD 0
>> > > > > > > > > > > >> Oops: 0002 [1] SMP
>> > > > > > > > > > > >> last sysfs file: /class/infiniband_mad/umad0/port
>> > > > > > > > > > > >> CPU 3
>> > > > > > > > > > > >> Modules linked in: ldiskfs(U) crc16(U) autofs4(U)
>> > >
>> > > hidp(U)
>> > >
>> > > > > > > l2cap(U)
>> > > > > > >
>> > > > > > > > > > > >> bluetooth(U) rdma_ucm(U) rdma_cm(U) iw_cm(U)
>> > > > > > > > > > > >> ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U)
>> > > > > > > > > > > >> ipv6(U)
>> > >
>> > > xfrm_nalgo(U)
>> > >
>> > > > > > > > > > > >> crypto_api(U)
>> > > > > > > > > > >
>> > > > > > > > > > > ib_uverbs(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >> ib_umad(U) mlx4_vnic(U) mlx4_vnic_helper(U)
>> ib_sa(U)
>> > > > > > > > > > > >> ib_mthca(U)
>> > > > > > > > > > >
>> > > > > > > > > > > mptctl(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >> dm_mirror(U) video(U) backlight(U) sbs(U)
>> > > > > > > > > > > >> power_meter(U)
>> > > > > > >
>> > > > > > > hwmon(U)
>> > > > > > >
>> > > > > > > > > > > i2c_ec(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >> i2c_core(U) dell_wmi(U) wmi(U) button(U) battery(U)
>> > > > > > > > > > > >> asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U)
>> > >
>> > > lp(U)
>> > >
>> > > > > > > > > > > >> parport(U)
>> > > > > > >
>> > > > > > > sr_mod(U)
>> > > > > > >
>> > > > > > > > > > > cdrom(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >> mlx4_ib(U) ib_mad(U) ib_core(U) joydev(U)
>> mlx4_core(U)
>> > > > > > > > >
>> > > > > > > > > usb_storage(U)
>> > > > > > > > >
>> > > > > > > > > > > >> pcspkr(U) shpchp(U) serio_raw(U) i5000_edac(U)
>> > >
>> > > edac_mc(U)
>> > >
>> > > > > > > > > dm_raid45(U)
>> > > > > > > > >
>> > > > > > > > > > > >> dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U)
>> > > > > > > > > > > >> dm_mem_cache(U)
>> > > > > > > > > > >
>> > > > > > > > > > > nfs(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U)
>> > > > >
>> > > > > mptscsih(U)
>> > > > >
>> > > > > > > > > > > mptbase(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U)
>> > > > > > > > > > > >> mppUpper(U)
>> > > > > > >
>> > > > > > > sg(U)
>> > > > > > >
>> > > > > > > > > > > >> sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U)
>> > > > > > > > > > > >> uhci_hcd(U)
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> ohci_hcd(U) ehci_hcd(U) Pid: 11360, comm: kjournald
>> > >
>> > > Tainted:
>> > > > > G
>> > > > >
>> > > > > > > > > > > >> 2.6.18-194.3.1.el5_lustre.1.8.4 #1
>> > > > > > > > > > > >> RIP: 0010:[<ffffffff88033448>]
>>  [<ffffffff88033448>]
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> :jbd:journal_commit_transaction+0xc5b/0x12db
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> RSP: 0018:ffff8101c6481d90  EFLAGS: 00010246
>> > > > > > >
>> > > > > > > > > > > >> RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> > > > > > > 00000000ffffffff
>> > > > > > >
>> > > > > > > > > > > >> RDX: 0000000000000000 RSI: ffff8101e9dab0c0 RDI:
>> > > > > > > ffff81022fa46000
>> > > > > > >
>> > > > > > > > > > > >> RBP: ffff81022fa46000 R08: ffff81022fa46068 R09:
>> > > > > > > 0000000000000000
>> > > > > > >
>> > > > > > > > > > > >> R10: ffff810105925b20 R11: 00000000fffffffa R12:
>> > > > > > > 0000000000000000
>> > > > > > >
>> > > > > > > > > > > >> R13: 0000000000000000 R14: ffff8101e9dab0c0 R15:
>> > > > > > > 0000000000000000
>> > > > > > >
>> > > > > > > > > > > >> FS:  0000000000000000(0000)
>> GS:ffff810107b9a4c0(0000)
>> > > > > > > > > > > >> knlGS:0000000000000000 CS:  0010 DS: 0018 ES: 0018
>> > > > > > > > > > > >> CR0: 000000008005003b CR2: 0000000000000000 CR3:
>> > > > > > > > > > > >> 00000001eaffb000
>> > > > > > >
>> > > > > > > CR4:
>> > > > > > > > > > > >> 00000000000006e0 Process kjournald (pid: 11360,
>> > >
>> > > threadinfo
>> > >
>> > > > > > > > > > > >> ffff8101c6480000, task ffff81021c14c0c0)
>> > > > > > > > > > > >> Stack:  ffff8101a61b9000 000000002b8263c0
>> > >
>> > > ffffffff00000000
>> > >
>> > > > > > > > > > > 0000000000000000
>> > > > > > > > > > >
>> > > > > > > > > > > >> 0000113b00000001 0000000000000013 0000000000000000
>> > > > > > > > > > > >> 0000000000000111 0000000000000000 0000000000000000
>> > > > > > > > > > > >> 0000000001282dd7 00000000000020dd Call Trace:
>> > > > > > > > > > > >> [<ffffffff8003da91>] lock_timer_base+0x1b/0x3c
>> > > > > > > > > > > >> [<ffffffff8004b347>]
>> try_to_del_timer_sync+0x7f/0x88
>> > > > > > > > > > > >> [<ffffffff88037386>] :jbd:kjournald+0xc1/0x213
>> > > > > > > > > > > >> [<ffffffff800a0ab2>]
>> autoremove_wake_function+0x0/0x2e
>> > > > > > > > > > > >> [<ffffffff800a089a>]
>> keventd_create_kthread+0x0/0xc4
>> > > > > > > > > > > >> [<ffffffff880372c5>] :jbd:kjournald+0x0/0x213
>> > > > > > > > > > > >> [<ffffffff800a089a>]
>> keventd_create_kthread+0x0/0xc4
>> > > > > > > > > > > >> [<ffffffff80032890>] kthread+0xfe/0x132
>> > > > > > > > > > > >> [<ffffffff8005dfb1>] child_rip+0xa/0x11
>> > > > > > > > > > > >> [<ffffffff800a089a>]
>> keventd_create_kthread+0x0/0xc4
>> > > > > > > > > > > >> [<ffffffff8014bcf4>] deadline_queue_empty+0x0/0x23
>> > > > > > > > > > > >> [<ffffffff80032792>] kthread+0x0/0x132
>> > > > > > > > > > > >> [<ffffffff8005dfa7>] child_rip+0x0/0x11
>> > > > > > > > > > > >>
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Code: f0 0f ba 33 01 e8 42 fc 02 f8 8b 03 a8 04 75
>> 07
>> > > > > > > > > > > >> 8b 43
>> > > > >
>> > > > > 58
>> > > > >
>> > > > > > > 85
>> > > > > > >
>> > > > > > > > > > > >> RIP  [<ffffffff88033448>]
>> > > > > > > > > :
>> > > > > > > > > :jbd:journal_commit_transaction+0xc5b/0x12db
>> > > > > > > > > :
>> > > > > > > > > > > >> RSP <ffff8101c6481d90>
>> > > > > > > > > > > >> CR2: 0000000000000000
>> > > > > > > > > > > >> <0>Kernel panic - not syncing: Fatal exception
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> On 22 October 2010 03:09, Andreas Dilger
>> > > > > > > > > > > >> <andreas.dilger at oracle.com
>> > > > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > >>> On 2010-10-21, at 18:44, Wojciech Turek <
>> > >
>> > > wjt27 at cam.ac.uk>
>> > >
>> > > > > > > wrote:
>> > > > > > > > > > > >>> fsck has finished and does not find any more
>> errors
>> > > > > > > > > > > >>> to correct. However when I try to mount the device
>> > > > > > > > > > > >>> as ldiskfs kernel panics
>> > > > > > > > >
>> > > > > > > > > with
>> > > > > > > > >
>> > > > > > > > > > > >>> following message:
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> Assertion failure in cleanup_journal_tail() at
>> > > > > > > > > > > >>> fs/jbd/checkpoint.c:459: "blocknr != 0"
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> Hmm, not sure, maybe your journal is broken?  You
>> can
>> > > > >
>> > > > > delete
>> > > > >
>> > > > > > > > > > > >>> it
>> > > > > > > > >
>> > > > > > > > > with
>> > > > > > > > >
>> > > > > > > > > > > >>> "tune2fs -O ^has_journal" (maybe after running
>> e2fsck
>> > > > > > > > > > > >>> again to
>> > > > > > > > >
>> > > > > > > > > clear
>> > > > > > > > >
>> > > > > > > > > > > the
>> > > > > > > > > > >
>> > > > > > > > > > > >>> journal), then re-create it with "tune2fs -j".
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> ----------- [cut here ] --------- [please bite
>> here ]
>> > > > > > > > > > > >>> --------- Kernel BUG at fs/jbd/checkpoint.c:459
>> > > > > > > > > > > >>> invalid opcode: 0000 [1] SMP
>> > > > > > > > > > > >>> last sysfs file: /class/infiniband_mad/umad0/
>> > > > > > > > > > > >>> port
>> > > > > > > > > > > >>> CPU 2
>> > > > > > > > > > > >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U)
>> > >
>> > > ost(U)
>> > >
>> > > > > > > > > > > >>> mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U)
>> > > > >
>> > > > > lquota(U)
>> > > > >
>> > > > > > > > > > > >>> osc(U)
>> > > > > > > > > > >
>> > > > > > > > > > > ksocklnd(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >>> ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U)
>> > > > > > > > > > > >>> libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U)
>> > > > > > > > > > > >>> rdma_ucm(U) rdma_cm(U) iw_cm(U)
>> > > > > > > > > > >
>> > > > > > > > > > > ib_addr(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >>> ib_ipoib(U) ipoib_helper(U) ib_cm(U) ipv6(U)
>> > > > > > > > > > > >>> xfrm_nalgo(U)
>> > > > > > > > > > >
>> > > > > > > > > > > crypto_api(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >>> ib_uverbs(U) ib_umad(U) mlx4_vnic(U)
>> > >
>> > > mlx4_vnic_helper(U)
>> > >
>> > > > > > > ib_sa(U)
>> > > > > > >
>> > > > > > > > > > > >>> ib_mthca(U) mptctl(U) dm_mirror(U) video(U)
>> > >
>> > > backlight(U)
>> > >
>> > > > > > > > > > > >>> sbs(U) power_meter(U) hwmon(U) i2c_ec(U)
>> i2c_core(U)
>> > > > > > > > > > > >>> dell_wmi(U)
>> > > > > > >
>> > > > > > > wmi(U)
>> > > > > > >
>> > > > > > > > > > > >>> button(U) battery(U) asus_acpi(U)
>> acpi_memhotplug(U)
>> > > > > > > > > > > >>> ac(U)
>> > > > > > > > > > >
>> > > > > > > > > > > parport_pc(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >>> lp(U) parport(U) sr_mod(U) cdrom(U) mlx4_ib(U)
>> > >
>> > > ib_mad(U)
>> > >
>> > > > > > > > > > > >>> ib_core(U) joydev(U) mlx4_core(U) usb_storage(U)
>> > > > > > > > > > > >>> shpchp(U) i5000_edac(U)
>> > > > > > > > > > >
>> > > > > > > > > > > edac_mc(U)
>> > > > > > > > > > >
>> > > > > > > > > > > >>> serio_raw(U) pcspkr(U) dm_raid45(U) dm_message(U)
>> > > > > > > > > > > >>> dm_region_hash(U) dm_log(U) dm_mod(U)
>> dm_mem_cache(U)
>> > > > >
>> > > > > nfs(U)
>> > > > >
>> > > > > > > > > > > >>> lockd(U) fscache(U) nfs_acl(U) sunrpc(U) mptsas(U)
>> > > > > > > > > > > >>> mptscsih(U) mptbase(U)
>> > > > > > > > > > > >>> scsi_transport_sas(U) mppVhba(U) megaraid_sas(U)
>> > > > >
>> > > > > mppUpper(U)
>> > > > >
>> > > > > > > > > > > >>> sg(U) sd_mod(U) scsi_mod(U) bnx2(U) ext3(U) jbd(U)
>> > > > > > > > > > > >>> uhci_hcd(U)
>> > > > > > > > >
>> > > > > > > > > ohci_hcd(U)
>> > > > > > > > >
>> > > > > > > > > > > >>> ehci_hcd(U) Pid: 13891, comm: mount Tainted: G
>> > > > > > >
>> > > > > > > > > > > >>> 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP:
>> > > > > > > 0010:[<ffffffff88034a95>]
>> > > > > > >
>> > > > > > > > > > > >>> [<ffffffff88034a95>]
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> RSP: 0018:ffff81016f00da68  EFLAGS: 00010286
>> > > > > > >
>> > > > > > > > > > > >>> RAX: 000000000000005a RBX: ffff81012ca12c00 RCX:
>> > > > > > > ffffffff80311da8
>> > > > > > >
>> > > > > > > > > > > >>> RDX: ffffffff80311da8 RSI: 0000000000000000 RDI:
>> > > > > > > ffffffff80311da0
>> > > > > > >
>> > > > > > > > > > > >>> RBP: 0000000000000000 R08: ffffffff80311da8 R09:
>> > > > > > > 0000000000000001
>> > > > > > >
>> > > > > > > > > > > >>> R10: 0000000000000000 R11: 0000000000000080 R12:
>> > > > > > > 0000000000000002
>> > > > > > >
>> > > > > > > > > > > >>> R13: ffff81012ca12d4c R14: ffff81012ca12c24 R15:
>> > > > > > > ffff81017a8d7400
>> > > > > > >
>> > > > > > > > > > > >>> FS:  00002abd7cef1f70(0000)
>> GS:ffff810107b9acc0(0000)
>> > > > > > > > > > > >>> knlGS:0000000000000000
>> > > > > > > > > > > >>> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> > > > > > >
>> > > > > > > > > > > >>> CR2: 000000000042b000 CR3: 000000012813f000 CR4:
>> > > > > > > 00000000000006e0
>> > > > > > >
>> > > > > > > > > > > >>> Process mount (pid: 13891, threadinfo
>> > > > > > > > > > > >>> ffff81016f00c000,
>> > > > >
>> > > > > task
>> > > > >
>> > > > > > > > > > > >>> ffff81022e1b7820)
>> > > > > > > > > > > >>> Stack:  0000000000000000 ffff81012ca12c00
>> > > > > > > > > > > >>> ffff81017a8d7400 ffffffff88037690
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> ffff81012ca12c00 ffff8102034ff000 ffff81017a8d7400
>> > > > > > > > > > > >>> 0000000000000000 ffff8102034ff000 ffffffff88a9be56
>> > > > > > > > > > > >>> 0000000001000000 ffff8101bf788000
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> Call Trace:
>> > > > > > > > > > > >>> [<ffffffff88037690>] :jbd:journal_flush+0xbe/0x248
>> > > > > > > > > > > >>> [<ffffffff88a9be56>]
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_mark_recovery_complete+0x36/0x90
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> [<ffffffff88aa02e0>]
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x1790/0x1950
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> [<ffffffff800eccd2>] get_filesystem+0x12/0x3b
>> > > > > > > > > > > >>> [<ffffffff800e343e>] test_bdev_super+0x0/0xd
>> > > > > > > > > > > >>> [<ffffffff88a9eb50>]
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> :ldiskfs:ldiskfs_fill_super+0x0/0x1950
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> [<ffffffff800e43fd>] get_sb_bdev+0x10a/0x16c
>> > > > > > > > > > > >>> [<ffffffff800e3d9a>] vfs_kern_mount+0x93/0x11a
>> > > > > > > > > > > >>> [<ffffffff800e3e63>] do_kern_mount+0x36/0x4d
>> > > > > > > > > > > >>> [<ffffffff800ee601>] do_mount+0x6a9/0x719
>> > > > > > > > > > > >>> [<ffffffff800090d2>] __handle_mm_fault+0x96f/0xfaa
>> > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89
>> > > > > > > > > > > >>> [<ffffffff8000a72a>] __link_path_walk+0xf1e/0xf42
>> > > > > > > > > > > >>> [<ffffffff800220ce>] __up_read+0x19/0x7f
>> > > > > > > > > > > >>> [<ffffffff80066b88>] do_page_fault+0x4fe/0x874
>> > > > > > > > > > > >>> [<ffffffff8002c9e0>] mntput_no_expire+0x19/0x89
>> > > > > > > > > > > >>> [<ffffffff8000ea45>] link_path_walk+0xa6/0xb2
>> > > > > > > > > > > >>> [<ffffffff800cc329>] zone_statistics+0x3e/0x6d
>> > > > > > > > > > > >>> [<ffffffff8000f2cf>] __alloc_pages+0x78/0x308
>> > > > > > > > > > > >>> [<ffffffff8004c68e>] sys_mount+0x8a/0xcd
>> > > > > > > > > > > >>> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> Code: 0f 0b 68 3a 94 03 88 c2 cb 01 44 39 a3 58 01
>> 00
>> > >
>> > > 00
>> > >
>> > > > > > > > > > > >>> 75 0e
>> > > > > > >
>> > > > > > > c7
>> > > > > > >
>> > > > > > > > > > > >>> RIP  [<ffffffff88034a95>]
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> :jbd:cleanup_journal_tail+0x9d/0x118
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> RSP <ffff81016f00da68>
>> > > > > > > > > > > >>> <0>Kernel panic - not syncing: Fatal exception
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> Any idea how to fix this?
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> Many thanks
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> Wojciech
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>>
>> > > > > > > > > > > >>> On 21 October 2010 17:54, Wojciech Turek < <
>> > > > >
>> > > > > wjt27 at cam.ac.uk>
>> > > > >
>> > > > > > > > > > > >>> wjt27 at cam.ac.uk> wrote:
>> > > > > > > > > > > >>>> Thanks Ken, that worked.
>> > > > > > > > > > > >>>>
>> > > > > > > > > > > >>>>
>> > > > > > > > > > > >>>> On 21 October 2010 17:39, Ken Hornstein <
>> > > > > > > > > > > >>>> <kenh at cmf.nrl.navy.mil>
>> > > > > > > > > > > >>>>
>> > > > > > > > > > > >>>> kenh at cmf.nrl.navy.mil> wrote:
>> > > > > > > > > > > >>>>>> Now I have another problem. After last segfault
>> I
>> > >
>> > > can
>> > >
>> > > > > not
>> > > > >
>> > > > > > > > > restart
>> > > > > > > > >
>> > > > > > > > > > > the
>> > > > > > > > > > >
>> > > > > > > > > > > >>>>> fsck
>> > > > > > > > > > > >>>>>
>> > > > > > > > > > > >>>>>> due to MMP.
>> > > > > > > > > > > >>>>>> [...]
>> > > > > > > > > > > >>>>>> Also when I try to access filesystem via
>> debugfs
>> > > > > > > > > > > >>>>>> it
>> > > > >
>> > > > > fails:
>> > > > > > > > > > > >>>>>> debugfs -c -R 'ls'
>> /dev/scratch2_ost16vg/ost16lv
>> > > > > > > > > > > >>>>>> debugfs 1.41.10.sun2 (24-Feb-2010)
>> > > > > > > > > > > >>>>>> /dev/scratch2_ost16vg/ost16lv: MMP: fsck being
>> run
>> > > > > > > > > > > >>>>>> while opening
>> > > > > > > > > > > >>>>>
>> > > > > > > > > > > >>>>> filesystem
>> > > > > > > > > > > >>>>>
>> > > > > > > > > > > >>>>>> ls: Filesystem not open
>> > > > > > > > > > > >>>>>>
>> > > > > > > > > > > >>>>>> Is there a way to clear teh MMP flag so it
>> allows
>> > >
>> > > fsck
>> > >
>> > > > > to
>> > > > >
>> > > > > > > run?
>> > > > > > >
>> > > > > > > > > > > >>>>> You want tune2fs -f -E clear-mmp
>> > > > > > > > > > > >>>>>
>> > > > > > > > > > > >>>>> --Ken
>> > > > >
>> > > > > --
>> > > > > Bernd Schubert
>> > > > > DataDirect Networks
>> > >
>> > > --
>> > > Bernd Schubert
>> > > DataDirect Networks
>>
>>
>> --
>> Bernd Schubert
>> DataDirect Networks
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20101026/f20029f2/attachment.htm>