[Lustre-discuss] Lustre Mount Crashing

Andreas Dilger adilger at sun.com
Mon Jun 2 12:36:46 PDT 2008


On Jun 02, 2008  12:58 -0400, Charles Taylor wrote:
> No, it is going down hard in a kernel panic.     All of the stack trace I 
> can see at the moment looks like (scribbled by hand... so forgive me for 
> leaving off the addresses and offsets).
>
>
> :libcfs:cfs_alloc
> :obdclass:lustre_init_lsi
> :obdclass:lustre_fill_super
> :obdclass::lustre_fill_super
> set_anon_super
> set_anon_super
> :obd_class:lustre_fill_super
> et_sb_nodev
> vfs_kern_mount
> do_kern_mount
> do_mount
> __handle_mm_fault
> __up_read
> do_page_fault
> zone_statistics
> __alloc_pages
> sys_mount
> system_call
>
> RIP <  .....  > resched_task

Hmm, this doesn't seem very useful.  The callpath shown:

	lustre_fill_super->lustre_init_lsi->cfs_alloc()

is _really_ early in the mount and either memory has been corrupted
before this point (causing cfs_alloc() to crash) or you are missing
some part of the stack at the top?

> I wish I could get the whole trace to you.   We might try to get kdump on 
> there but my luck with kdump has been mixed.   It seems to work with some 
> chipsets and not with others.

> Anyway, we may just be out of luck.   I just hate to give up too easily 
> because it seems like everything is solid yet we crash on or just after the 
> mount.   This is on a MDS that has been running without a problem for 5 
> months (lustre 1.6.4.2 ).
>
> uname -a
> Linux hpcmds 2.6.18-8.1.14.el5.L-1642 #2 SMP Thu Feb 21 15:42:14 EST 2008 
> x86_64 x86_64 x86_64 GNU/Linux

If mounting with "-o abort_recovery" doesn't solve the problem,
are you able to mount the MDT filesystem as "-t ldiskfs" instead of
"-t lustre"?  Try that, then copy and truncate the last_rcvd file:

	mount -t ldiskfs /dev/MDSDEV /mnt/mds
	cp /mnt/mds/last_rcvd /mnt/mds/last_rcvd.sav
	cp /mnt/mds/last_rcvd /tmp/last_rcvd.sav
	dd if=/mnt/mds/last_rcvd.sav of=/mnt/mds/last_rcvd bs=8k count=1
	umount /mnt/mds

	mount -t lustre /dev/MSDDEV /mnt/mds

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list