[Lustre-discuss] LBUG in lustre 1.8.1 when client mounts something with bind option

Daniel Basabe dbasabe at soporte.cti.csic.es
Thu Aug 20 03:43:36 PDT 2009


Hi

Recently I upgrade lustre to 1.8.1 from 1.6.6. Before, I've never got problems 
with lustre.

My clients mount the lustre filesystem under /clusterha, at this point 
everything works ok. But when I try, for example, this:

# mount -o bind,rw /clusterha/home /home

It produces an LBUG in the MGS:


LustreError: 5164:0:(pack_generic.c:655:lustre_shrink_reply_v2()) 
ASSERTION(msg->lm_bufcount > segment) failed                                               
LustreError: 5164:0:(pack_generic.c:655:lustre_shrink_reply_v2()) LBUG                                                                                       
Lustre: 5164:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for 
process 5164                                                                   
ll_mdt_18     R  running task       0  5164      1          5165  5163 (L-TLB)                                                                               
 0000000000000000 ffffffff887f6d5a ffff8104173e0280 ffffffff887f646e                                                                                         
 ffffffff887f6462 0000000000000086 0000000000000002 ffffffff801616e5                                                                                         
 0000000000000001 0000000000000000 ffffffff802f6aa0 0000000000000000                                                                                         
Call Trace:                                                                                                                                                  
 [<ffffffff8009daf8>] autoremove_wake_function+0x9/0x2e                                                                                                      
 [<ffffffff80088819>] __wake_up_common+0x3e/0x68                                                                                                             
 [<ffffffff80088819>] __wake_up_common+0x3e/0x68                                                                                                             
 [<ffffffff8002e6ba>] __wake_up+0x38/0x4f                                                                                                                    
 [<ffffffff800a540a>] kallsyms_lookup+0xc2/0x17b                                                                                                             
 [<ffffffff800a540a>] kallsyms_lookup+0xc2/0x17b                                                                                                             
 [<ffffffff800a540a>] kallsyms_lookup+0xc2/0x17b                                                                                                             
 [<ffffffff800a540a>] kallsyms_lookup+0xc2/0x17b                                                                                                             
 [<ffffffff8006bb5d>] printk_address+0x9f/0xab                                                                                                               
 [<ffffffff8008f800>] printk+0x8/0xbd                                                                                                                        
 [<ffffffff8008f84a>] printk+0x52/0xbd                                                                                                                       
 [<ffffffff800a2e08>] module_text_address+0x33/0x3c                                                                                                          
 [<ffffffff8009c088>] kernel_text_address+0x1a/0x26                                                                                                          
 [<ffffffff8006b843>] dump_trace+0x211/0x23a                                                                                                                 
 [<ffffffff8006b8a0>] show_trace+0x34/0x47                                                                                                                   
 [<ffffffff8006b9a5>] _show_stack+0xdb/0xea                                                                                                                  
 [<ffffffff887ebada>] :libcfs:lbug_with_loc+0x7a/0xd0                                                                                                        
 [<ffffffff887f3c70>] :libcfs:tracefile_init+0x0/0x110                                                                                                       
 [<ffffffff8894c218>] :ptlrpc:lustre_shrink_reply_v2+0xa8/0x240                                                                                              
 [<ffffffff88c53529>] :mds:mds_getattr_lock+0xc59/0xce0                                                                                                      
 [<ffffffff8894aea4>] :ptlrpc:lustre_msg_add_version+0x34/0x110                                                                                              
 [<ffffffff8883c923>] :lnet:lnet_ni_send+0x93/0xd0                                                                                                           
 [<ffffffff8883ed23>] :lnet:lnet_send+0x973/0x9a0                                                                                                            
 [<ffffffff88c4dfca>] :mds:fixup_handle_for_resent_req+0x5a/0x2c0                                                                                            
 [<ffffffff88c59a76>] :mds:mds_intent_policy+0x636/0xc10                                                                                                     
 [<ffffffff8890d6f6>] :ptlrpc:ldlm_resource_putref+0x1b6/0x3a0                                                                                               
 [<ffffffff8890ad46>] :ptlrpc:ldlm_lock_enqueue+0x186/0xb30                                                                                                  
 [<ffffffff88926acf>] :ptlrpc:ldlm_export_lock_get+0x6f/0xe0                                                                                                 
 [<ffffffff88889e48>] :obdclass:lustre_hash_add+0x218/0x2e0                                                                                                  
 [<ffffffff8892f530>] :ptlrpc:ldlm_server_blocking_ast+0x0/0x83d                                                                                             
 [<ffffffff8892d669>] :ptlrpc:ldlm_handle_enqueue+0xc19/0x1210                                                                                               
 [<ffffffff88c57630>] :mds:mds_handle+0x4080/0x4cb0                                                                                                          
 [<ffffffff80148d4f>] __next_cpu+0x19/0x28                                                                                                                   
 [<ffffffff80148d4f>] __next_cpu+0x19/0x28                                                                                                                   
 [<ffffffff80088f32>] find_busiest_group+0x20d/0x621                                                                                                         
 [<ffffffff8894fa15>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0                                                                                              
 [<ffffffff80089d89>] enqueue_task+0x41/0x56                                                                                                                 
 [<ffffffff8895472d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110                                                                                                    
 [<ffffffff88956e67>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160                                                                                      
 [<ffffffff8003dc3f>] lock_timer_base+0x1b/0x3c                                                                                                              
 [<ffffffff80088819>] __wake_up_common+0x3e/0x68                                                                                                             
 [<ffffffff8895a908>] :ptlrpc:ptlrpc_main+0x1218/0x13e0                                                                                                      
 [<ffffffff8008a3ef>] default_wake_function+0x0/0xe                                                                                                          
 [<ffffffff800b48dd>] audit_syscall_exit+0x327/0x342                                                                                                         
 [<ffffffff8005dfb1>] child_rip+0xa/0x11                                                                                                                     
 [<ffffffff889596f0>] :ptlrpc:ptlrpc_main+0x0/0x13e0                                                                                                         
 [<ffffffff8005dfa7>] child_rip+0x0/0x11                                                                                                                     

LustreError: dumping log to /tmp/lustre-log.1250760001.5164
Lustre: 0:0:(watchdog.c:181:lcw_cb()) Watchdog triggered for pid 5164: it was 
inactive for 200.00s
Lustre: 0:0:(linux-debug.c:264:libcfs_debug_dumpstack()) showing stack for 
process 5164           
ll_mdt_18     D ffff81000102df80     0  5164      1          5165  5163 (L-TLB)
 ffff810411625810 0000000000000046 0000000000000000 0000000000000000
 ffff8104116257d0 0000000000000009 ffff810413360080 ffff81042fe9d100
 00008b3a197b862f 0000000000000ed5 ffff810413360268 000000050000028f
Call Trace:
 [<ffffffff8008a3ef>] default_wake_function+0x0/0xe
 [<ffffffff887ebb26>] :libcfs:lbug_with_loc+0xc6/0xd0
 [<ffffffff887f3c70>] :libcfs:tracefile_init+0x0/0x110
 [<ffffffff8894c218>] :ptlrpc:lustre_shrink_reply_v2+0xa8/0x240
 [<ffffffff88c53529>] :mds:mds_getattr_lock+0xc59/0xce0
 [<ffffffff8894aea4>] :ptlrpc:lustre_msg_add_version+0x34/0x110
 [<ffffffff8883c923>] :lnet:lnet_ni_send+0x93/0xd0
 [<ffffffff8883ed23>] :lnet:lnet_send+0x973/0x9a0
 [<ffffffff88c4dfca>] :mds:fixup_handle_for_resent_req+0x5a/0x2c0
 [<ffffffff88c59a76>] :mds:mds_intent_policy+0x636/0xc10
 [<ffffffff8890d6f6>] :ptlrpc:ldlm_resource_putref+0x1b6/0x3a0
 [<ffffffff8890ad46>] :ptlrpc:ldlm_lock_enqueue+0x186/0xb30
 [<ffffffff88926acf>] :ptlrpc:ldlm_export_lock_get+0x6f/0xe0
 [<ffffffff88889e48>] :obdclass:lustre_hash_add+0x218/0x2e0
 [<ffffffff8892f530>] :ptlrpc:ldlm_server_blocking_ast+0x0/0x83d
 [<ffffffff8892d669>] :ptlrpc:ldlm_handle_enqueue+0xc19/0x1210
 [<ffffffff88c57630>] :mds:mds_handle+0x4080/0x4cb0
 [<ffffffff80148d4f>] __next_cpu+0x19/0x28
 [<ffffffff80148d4f>] __next_cpu+0x19/0x28
 [<ffffffff80088f32>] find_busiest_group+0x20d/0x621
 [<ffffffff8894fa15>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
 [<ffffffff80089d89>] enqueue_task+0x41/0x56
 [<ffffffff8895472d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110
 [<ffffffff88956e67>] :ptlrpc:ptlrpc_server_handle_request+0xa97/0x1160
 [<ffffffff8003dc3f>] lock_timer_base+0x1b/0x3c
 [<ffffffff80088819>] __wake_up_common+0x3e/0x68
 [<ffffffff8895a908>] :ptlrpc:ptlrpc_main+0x1218/0x13e0
 [<ffffffff8008a3ef>] default_wake_function+0x0/0xe
 [<ffffffff800b48dd>] audit_syscall_exit+0x327/0x342
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff889596f0>] :ptlrpc:ptlrpc_main+0x0/0x13e0
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

LustreError: dumping log to /tmp/lustre-log.1250760201.5164
Lustre: 5162:0:(service.c:786:ptlrpc_at_send_early_reply()) @@@ Couldn't add 
any time (5/5), not sending early reply
  req at ffff81040f602400 x1311420275108778/t0 o101-
>3f31386b-70e3-8c4f-6ecf-83adfc123156 at NET_0x20000c0a80a03_UUID:0/0 lens 
544/600 e 24 to 0 dl 1250760601 ref 2 fl Interpret:/0/0 rc 0/0

And the client hangs.

With 1.6.6 this action worked fine.

Another difference with the previous configuration is that in the new one, I've 
created a link agregation in tcp0 device both in the MGS side and in the 
client side:
#cat /etc/modprobe.conf
 
alias eth0 bnx2
alias eth1 bnx2
alias scsi_hostadapter cciss
alias scsi_hostadapter1 ata_piix
alias scsi_hostadapter2 qla2xxx
alias bond0 bonding
options bond0 mode=4
alias ib0 ib_ipoib
alias ib1 ib_ipoib
options lnet accept=all networks=o2ib0(ib0),tcp0(bond0)
alias net-pf-27 ib_sdp


My currently configuration has three OST's connected to MGS with infiniband 
(o2ib) and tcp ethernet (bond0):

MGS:

Reading CONFIGS/mountdata

   Read previous values:
Target:     shared-MDT0000
Index:      0
Lustre FS:  shared
Mount type: ldiskfs
Flags:      0x5
              (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: failover.node=10.0.0.200 at o2ib0,192.168.10.200 at tcp0 
mdt.group_upcall=/usr/sbin/l_getgroups


OST 0:

Target:     shared-OST0000
Index:      0
Lustre FS:  shared
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: failover.node=10.0.0.7 at o2ib0,192.168.10.7 at tcp0 
mgsnode=10.0.0.201 at o2ib,192.168.10.201 at tcp 
mgsnode=10.0.0.200 at o2ib,192.168.10.200 at tcp


OST 1:

Target:     shared-OST0001
Index:      1
Lustre FS:  shared
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: failover.node=10.0.0.6 at o2ib0,192.168.10.6 at tcp0 
mgsnode=10.0.0.201 at o2ib,192.168.10.201 at tcp 
mgsnode=10.0.0.200 at o2ib,192.168.10.200 at tcp


OST 2:

Target:     shared-OST0002
Index:      2
Lustre FS:  shared
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: failover.node=10.0.0.201 at o2ib0,192.168.10.201 at tcp0 
mgsnode=10.0.0.201 at o2ib,192.168.10.201 at tcp 
mgsnode=10.0.0.200 at o2ib,192.168.10.200 at tcp

I attach the dump log.

Does anyone know what is happening?

Thanks.


Regards.


 

-- 
Daniel Basabe del Pino
------------------------
Administrador de Sistemas HPC
BULL / Secretaría General Adjunta de Informática CSIC
Tlfno: 915642963
Ext: 272 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre-log.1250760001.5164
Type: application/octet-stream
Size: 1052892 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090820/229b3364/attachment.obj>


More information about the lustre-discuss mailing list