[Lustre-discuss] Lustre Mount Crashing
Andreas Dilger
adilger at sun.com
Mon Jun 2 12:36:46 PDT 2008
On Jun 02, 2008 12:58 -0400, Charles Taylor wrote:
> No, it is going down hard in a kernel panic. All of the stack trace I
> can see at the moment looks like (scribbled by hand... so forgive me for
> leaving off the addresses and offsets).
>
>
> :libcfs:cfs_alloc
> :obdclass:lustre_init_lsi
> :obdclass:lustre_fill_super
> :obdclass::lustre_fill_super
> set_anon_super
> set_anon_super
> :obd_class:lustre_fill_super
> et_sb_nodev
> vfs_kern_mount
> do_kern_mount
> do_mount
> __handle_mm_fault
> __up_read
> do_page_fault
> zone_statistics
> __alloc_pages
> sys_mount
> system_call
>
> RIP < ..... > resched_task
Hmm, this doesn't seem very useful. The callpath shown:
lustre_fill_super->lustre_init_lsi->cfs_alloc()
is _really_ early in the mount and either memory has been corrupted
before this point (causing cfs_alloc() to crash) or you are missing
some part of the stack at the top?
> I wish I could get the whole trace to you. We might try to get kdump on
> there but my luck with kdump has been mixed. It seems to work with some
> chipsets and not with others.
> Anyway, we may just be out of luck. I just hate to give up too easily
> because it seems like everything is solid yet we crash on or just after the
> mount. This is on a MDS that has been running without a problem for 5
> months (lustre 1.6.4.2 ).
>
> uname -a
> Linux hpcmds 2.6.18-8.1.14.el5.L-1642 #2 SMP Thu Feb 21 15:42:14 EST 2008
> x86_64 x86_64 x86_64 GNU/Linux
If mounting with "-o abort_recovery" doesn't solve the problem,
are you able to mount the MDT filesystem as "-t ldiskfs" instead of
"-t lustre"? Try that, then copy and truncate the last_rcvd file:
mount -t ldiskfs /dev/MDSDEV /mnt/mds
cp /mnt/mds/last_rcvd /mnt/mds/last_rcvd.sav
cp /mnt/mds/last_rcvd /tmp/last_rcvd.sav
dd if=/mnt/mds/last_rcvd.sav of=/mnt/mds/last_rcvd bs=8k count=1
umount /mnt/mds
mount -t lustre /dev/MSDDEV /mnt/mds
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list