[Lustre-discuss] incorrect num_osts; LBUG

Michael Sternberg sternberg at anl.gov
Fri Jan 2 17:24:31 PST 2009


While performing an LFSCK after upgrade lustre-1.6.6, the MDS claims  
to have 512 OSTs:

	MDS: num_osts = 512

I am in the middle of an update, in a currently asymmetric heartbeat  
config, with a passive MDS and OSS on lustre-1.6.5.1, and the  
currently active MDS and OSS (where num_osts comes out wrong) on  
lustre-1.6.6.

The rest of the e2fsck output looks perfectly normal and in line with  
previous runs.  I run two lustre file systems, /home and /sandbox off  
the same MGS, and e2fsck for BOTH file systems reports 512 OSTs.   
Running e2fsck on the 1.6.5.1 servers gives the same blown up OST list.

Does this mean my MGS is trashed?


In context:

= 
= 
= 
= 
========================================================================
# e2fsck -n -v --mdsdb /tmp/mdsdb-sandbox /dev/dm-2; date
e2fsck 1.40.11.sun1 (17-June-2008)
device /dev/dm-2 mounted by lustre per /proc/fs/lustre/mds/sandbox- 
MDT0000/mntdev
Warning!  /dev/dm-2 is mounted.
Warning: skipping journal recovery because doing a read-only  
filesystem check.
sandbox-MDT0000 has been mounted 47 times without being checked, check  
forced.
Pass 1: Checking inodes, blocks, and sizes
MDS: ost_idx 0 max_id 1192499
MDS: ost_idx 1 max_id 0
MDS: ost_idx 2 max_id 0
MDS: ost_idx 3 max_id 0
MDS: ost_idx 4 max_id 0
...
MDS: ost_idx 509 max_id 0
MDS: ost_idx 510 max_id 0
MDS: ost_idx 511 max_id 0
MDS: got 4096 bytes = 512 entries in lov_objids
MDS: max_files = 36420
MDS: num_osts = 512
mds info db file written
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Pass 6: Acquiring information for lfsck
MDS: max_files = 36420
MDS: num_osts = 512
MDS: 'sandbox-MDT0000_UUID' mdt idx 0: compat 0x4 rocomp 0x1 incomp 0x4

    36420 inodes used (0.05%)
        3 non-contiguous inodes (0.0%)
          # of inodes with ind/dind/tind blocks: 1/0/0
  9283672 blocks used (12.67%)
        0 bad blocks
        1 large file

    32592 regular files
     3818 directories
        0 character device files
        0 block device files
        0 fifos
        1 link
        1 symbolic link (1 fast symbolic link)
        0 sockets
--------
    36412 files
= 
= 
= 
= 
========================================================================


I did an LFSCK just prior to the upgrade, which showed num_osts = 1.


This is on RHEL-5.2 with the latest e2fsprogs:

	# e2fsck -V
	e2fsck 1.40.11.sun1 (17-June-2008)
         	Using EXT2FS Library version 1.40.11.sun1, 17-June-2008


I run two MDS and OSS each in heartbeat; I encountered an LBUG during  
a heartbeat hiccup (output appended).

Kernels for the updated and active MDS and OSS, and some clients are:
	2.6.18-92.1.10.el5_lustre.1.6.6smp

the passive MDS and OSS, and remaining clients are:
	2.6.18-53.1.14.el5_lustre.1.6.5.1smp



Regards, Michael


Jan  2 15:41:20 mds02 kernel: LustreError: 14500:0:(mgs_handler.c: 
194:mgs_setup()) ASSERTION(!lvfs_check_rdonly(lvfs_sbdev(mnt- 
 >mnt_sb))) failed
Jan  2 15:41:20 mds02 kernel: LustreError: 14500:0:(tracefile.c: 
450:libcfs_assertion_failed()) LBUG
Jan  2 15:41:20 mds02 kernel: Lustre: 14500:0:(linux-debug.c: 
185:libcfs_debug_dumpstack()) showing stack for process 14500
Jan  2 15:41:20 mds02 kernel: mount.lustre  R  running task       0  
14500  14499                     (NOTLB)
Jan  2 15:41:20 mds02 kernel:  0000000000000040 0000000000000020  
ffff8102105a5568 ffffffff88966410
Jan  2 15:41:20 mds02 kernel:  ffffffff888d3b80 ffff810200b242c0  
0000000000000000 0000000000000000
Jan  2 15:41:20 mds02 kernel:  0000000000000000 0000000000000000  
ffff8102105a5598 ffffffff80143a09
Jan  2 15:41:20 mds02 kernel: Call Trace:
Jan  2 15:41:20 mds02 kernel:  [<ffffffff8006b499>] dump_trace 
+0x211/0x23a
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff88966410>] :ptlrpc:lprocfs_rd_pool_state+0x0/0x200
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff888d3b80>] :obdclass:lprocfs_wr_atomic+0x0/0x60
Jan  2 15:41:20 mds02 kernel:  [<ffffffff8006b4f6>] show_trace+0x34/0x47
Jan  2 15:41:20 mds02 kernel:  [<ffffffff8006b5fb>] _show_stack+0xdb/ 
0xea
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff8882ac2a>] :libcfs:lbug_with_loc+0x7a/0xc0
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff88832874>] :libcfs:libcfs_assertion_failed+0x54/0x60
Jan  2 15:41:20 mds02 kernel:  [<ffffffff88beab81>] :mgs:mgs_setup 
+0x301/0x800
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff888dff12>] :obdclass:class_setup+0x942/0xc70
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff888e263d>] :obdclass:class_process_config+0x14bd/0x19e0
Jan  2 15:41:20 mds02 kernel:  [<ffffffff888ec524>] :obdclass:do_lcfg 
+0x924/0xb20
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff888ee190>] :obdclass:lustre_start_simple+0x130/0x1d0
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff888f1bb3>] :obdclass:server_start_mgs+0x223/0x320
Jan  2 15:41:20 mds02 OpenSM[14448]: Entering MASTER state
Jan  2 15:41:20 mds02 kernel:   
[<ffffffff888f3552>] :obdclass:server_fill_super+0x18a2/0x1fb0
Jan  2 15:41:21 mds02 kernel:   
[<ffffffff886d5ac7>] :sunrpc:rpc_call_sync+0x9e/0xa8
Jan  2 15:41:21 mds02 kernel:  [<ffffffff800963e7>] recalc_sigpending 
+0xe/0x25
Jan  2 15:41:21 mds02 attrd: [14276]: info: main: Starting mainloop...
Jan  2 15:41:21 mds02 kernel:  [<ffffffff80009516>] __d_lookup+0xb0/0xff
Jan  2 15:41:21 mds02 OpenSM[14448]: SUBNET UP
Jan  2 15:41:21 mds02 crmd: [14277]: notice: populate_cib_nodes: Node:  
mds02 (uuid: f03a4080-8623-4049-a725-723ff0995fe2)
Jan  2 15:41:21 mds02 kernel:  [<ffffffff8882b178>] :libcfs:cfs_alloc 
+0x28/0x60
Jan  2 15:41:21 mds02 crmd: [14277]: notice: populate_cib_nodes: Node:  
mds01 (uuid: a25627aa-9f96-486f-8f17-4ef11b62dc69)
Jan  2 15:41:21 mds02 kernel:   
[<ffffffff888e7693>] :obdclass:lustre_init_lsi+0x263/0x4c0
Jan  2 15:41:21 mds02 kernel:   
[<ffffffff888f51b3>] :obdclass:lustre_fill_super+0x1553/0x16d0
Jan  2 15:41:21 mds02 kernel:  [<ffffffff800e3aab>] get_filesystem 
+0x12/0x3b
Jan  2 15:41:21 mds02 kernel:  [<ffffffff800dc008>] set_anon_super 
+0x0/0xab
Jan  2 15:41:21 mds02 kernel:   
[<ffffffff888f3c60>] :obdclass:lustre_fill_super+0x0/0x16d0
Jan  2 15:41:21 mds02 kernel:  [<ffffffff800dc273>] get_sb_nodev+0x4f/ 
0x97
Jan  2 15:41:21 mds02 kernel:  [<ffffffff800dbb5a>] vfs_kern_mount 
+0x93/0x11a
Jan  2 15:41:21 mds02 kernel:  [<ffffffff800dbc23>] do_kern_mount 
+0x36/0x4d
Jan  2 15:41:21 mds02 kernel:  [<ffffffff800e53ca>] do_mount+0x68c/0x6fc
Jan  2 15:41:21 mds02 kernel:  [<ffffffff80008b39>] __handle_mm_fault 
+0x4ea/0xe17
Jan  2 15:41:21 mds02 kernel:  [<ffffffff80021f02>] __up_read+0x19/0x7f
Jan  2 15:41:21 mds02 kernel:  [<ffffffff80066858>] do_page_fault 
+0x4fe/0x830
Jan  2 15:41:21 mds02 kernel:  [<ffffffff80008b39>] __handle_mm_fault 
+0x4ea/0xe17
Jan  2 15:41:21 mds02 kernel:  [<ffffffff800c6029>] zone_statistics 
+0x3e/0x6d
Jan  2 15:41:21 mds02 kernel:  [<ffffffff8000f04d>] __alloc_pages+0x5c/ 
0x2c5
Jan  2 15:41:21 mds02 kernel:  [<ffffffff8004bb8b>] sys_mount+0x8a/0xcd
Jan  2 15:41:21 mds02 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Jan  2 15:41:21 mds02 kernel:
Jan  2 15:41:21 mds02 kernel: LustreError: dumping log to /tmp/lustre- 
log.1230932480.14500




More information about the lustre-discuss mailing list