[Lustre-discuss] mdt threads "hanging" (?) until writes no longer permitted

Hendelman, Rob Rob.Hendelman at magnetar.com
Fri Dec 11 09:51:45 PST 2009


Hello.

We have tried both lustre 1.6.7.2 server & 1.6.7.2 client/1.8.1.1-56 (cvs) client (patchless) on Ubuntu with kernel 2.6.24-25.63 which maps to mainline 2.6.24.6 and have seen some strange behavior.

MGS/MDS is a HPDL385G6 with 36G ram.  It has two partitions on a a set of raid 73G 15k sas for mgs/mds:

Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p1      61G  2.0G   56G   4% /
tmpfs                  18G     0   18G   0% /dev/shm
/dev/lustre/mgs      1008M   34M  924M   4% /mnt/mgs
/dev/lustre/mdt
                      7.0G  437M  6.2G   7% /mnt/mdt

The OSS is a separate server with the same cpu/ram config but has 4 OST's each a raid10 of 4 * 146G 15k sas drives on fc san

Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p1      59G  2.0G   54G   4% /
tmpfs                  18G     0   18G   0% /dev/shm
/dev/data/ost00
                      264G   80G  171G  32% /mnt/data-ost00
/dev/data/ost01
                      264G   76G  175G  31% /mnt/data-ost01
/dev/data/ost02
                      264G   74G  177G  30% /mnt/data-ost02
/dev/data/ost03
                      264G   73G  178G  30% /mnt/data-ost03

Our programmers here have created a short program that uses a 3rd party library (which we don't have source code to) that actually performs the file I/O that creates our problem.  The program does many many reads of very small files an no writes.  What ends up happening is that sometime in the middle of the processing, the load goes up very very high 300-700 on the client running the program, however the system is still responsive except to a few commands ("w" or "ps" are good examples).  On the MDT we see the following:

BUG: soft lockup - CPU#1 stuck for 10s! [ll_evictor:3245]
CPU 1:
Modules linked in: sunrpc(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) osc(U) lov(U) mds(U) lquota(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ptlrpc(U) l
diskfs(U) crc16(U) ksocklnd(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) i2c_core(U) button(U) battery(U) asus_acpi(U) acpi_memho
tplug(U) ac(U) parport_pc(U) lp(U) parport(U) pcspkr(U) hpwdt(U) serio_raw(U) bnx2(U) shpchp(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) usb_storage(U) lpfc(U) scsi_transpo
rt_fc(U) cciss(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
Pid: 3245, comm: ll_evictor Tainted: G      2.6.18-92.1.26.el5_lustre.1.6.7.2smp #1
RIP: 0010:[<ffffffff800649e6>]  [<ffffffff800649e6>] _write_lock+0x7/0xf
RSP: 0018:ffff810493d89d68  EFLAGS: 00000246
RAX: ffff8103b4900c00 RBX: 000000000000c5e0 RCX: 00000000000120e6
RDX: 000000000000037e RSI: ffffffff802f1d80 RDI: ffffc20010861e0c
RBP: 0000000000000282 R08: ffff8104afcb3360 R09: 0000000000000000
R10: ffff8103b49ed840 R11: 0000000000000150 R12: 0000000000000282
R13: ffff810493d89d10 R14: ffff8103b49ed898 R15: ffff8103b49ed840
FS:  00002b4db7ec2230(0000) GS:ffff8104a007ac40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002ab671c5d0a0 CR3: 000000091d857000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff883593fe>] :obdclass:lustre_hash_for_each_empty+0x20e/0x290
 [<ffffffff8835fc38>] :obdclass:class_disconnect+0x378/0x400
 [<ffffffff884bd610>] :ptlrpc:ldlm_cancel_locks_for_export_cb+0x0/0xc0
 [<ffffffff885cc43b>] :mds:mds_disconnect+0x11b/0x310
 [<ffffffff8835bce4>] :obdclass:class_fail_export+0x384/0x4c0
 [<ffffffff8850daa8>] :ptlrpc:ping_evictor_main+0x4f8/0x7d5
 [<ffffffff8008abbc>] default_wake_function+0x0/0xe
 [<ffffffff800b4391>] audit_syscall_exit+0x31b/0x336
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8850d5b0>] :ptlrpc:ping_evictor_main+0x0/0x7d5
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

BUG: soft lockup - CPU#1 stuck for 10s! [ll_evictor:3245]
CPU 1:
Modules linked in: sunrpc(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) osc(U) lov(U) mds(U) lquota(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ptlrpc(U) l
diskfs(U) crc16(U) ksocklnd(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) i2c_core(U) button(U) battery(U) asus_acpi(U) acpi_memho
tplug(U) ac(U) parport_pc(U) lp(U) parport(U) pcspkr(U) hpwdt(U) serio_raw(U) bnx2(U) shpchp(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) usb_storage(U) lpfc(U) scsi_transpo
rt_fc(U) cciss(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
Pid: 3245, comm: ll_evictor Tainted: G      2.6.18-92.1.26.el5_lustre.1.6.7.2smp #1
RIP: 0010:[<ffffffff883593e0>]  [<ffffffff883593e0>] :obdclass:lustre_hash_for_each_empty+0x1f0/0x290
RSP: 0018:ffff810493d89d70  EFLAGS: 00000206
RAX: ffff8103a8880200 RBX: 000000000000c479 RCX: 0000000000014eaf
RDX: 000000000000025b RSI: ffffffff802f1d80 RDI: ffffc2001086079c
RBP: ffffffff884bb912 R08: ffff8104afcb3360 R09: ffff8103b4f14a00
R10: 0000000c00000000 R11: 0000000000000150 R12: ffff8103a8880200
R13: 0000000000000282 R14: 0000000000000282 R15: ffff810493d89d10
FS:  00002b4db7ec2230(0000) GS:ffff8104a007ac40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002ab671c5d0a0 CR3: 000000091d857000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff8835fc38>] :obdclass:class_disconnect+0x378/0x400
 [<ffffffff884bd610>] :ptlrpc:ldlm_cancel_locks_for_export_cb+0x0/0xc0
 [<ffffffff885cc43b>] :mds:mds_disconnect+0x11b/0x310
 [<ffffffff8835bce4>] :obdclass:class_fail_export+0x384/0x4c0
 [<ffffffff8850daa8>] :ptlrpc:ping_evictor_main+0x4f8/0x7d5
 [<ffffffff8008abbc>] default_wake_function+0x0/0xe
 [<ffffffff800b4391>] audit_syscall_exit+0x31b/0x336
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8850d5b0>] :ptlrpc:ping_evictor_main+0x0/0x7d5
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

As well as:

BUG: soft lockup - CPU#3 stuck for 10s! [ll_mdt_21:3313]
CPU 3:
Modules linked in: sunrpc(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) osc(U) lov(U) mds(U) lquota(U) fsfilt_ldiskfs(U) mgs(U) mgc(U) ptlrpc(U) l
diskfs(U) crc16(U) ksocklnd(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) dm_multipath(U) video(U) sbs(U) backlight(U) i2c_ec(U) i2c_core(U) button(U) battery(U) asus_acpi(U) acpi_memho
tplug(U) ac(U) parport_pc(U) lp(U) parport(U) pcspkr(U) hpwdt(U) serio_raw(U) bnx2(U) shpchp(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) usb_storage(U) lpfc(U) scsi_transpo
rt_fc(U) cciss(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U)
Pid: 3313, comm: ll_mdt_21 Tainted: G      2.6.18-92.1.26.el5_lustre.1.6.7.2smp #1
RIP: 0010:[<ffffffff800649e6>]  [<ffffffff800649e6>] _write_lock+0x7/0xf
RSP: 0018:ffff81049346bb08  EFLAGS: 00000246
RAX: ffff8103fd4cd400 RBX: 0000000000007ff5 RCX: 000000000000c76e
RDX: fffffffffff9d458 RSI: ffffffff802f1d80 RDI: ffffc20010865f5c
RBP: 0000000000000282 R08: ffff8104afcbb8e0 R09: 0000000000000000
R10: ffff81043a9d6680 R11: 0000000000000150 R12: 0000000000000282
R13: ffff81049346bab0 R14: ffff81043a9d66d8 R15: ffff81043a9d6680
FS:  00002b000b9be230(0000) GS:ffff81091ff5bb40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000037ff495770 CR3: 0000000000201000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff883593fe>] :obdclass:lustre_hash_for_each_empty+0x20e/0x290
 [<ffffffff8835fc38>] :obdclass:class_disconnect+0x378/0x400
 [<ffffffff884bd610>] :ptlrpc:ldlm_cancel_locks_for_export_cb+0x0/0xc0
 [<ffffffff885cc43b>] :mds:mds_disconnect+0x11b/0x310
 [<ffffffff885010f9>] :ptlrpc:lustre_pack_reply+0x29/0xb0
 [<ffffffff884c7c61>] :ptlrpc:target_handle_disconnect+0x461/0x560
 [<ffffffff884fc005>] :ptlrpc:lustre_msg_get_opc+0x35/0xf0
 [<ffffffff885c7734>] :mds:mds_handle+0xda4/0x4d20
 [<ffffffff80143b75>] __next_cpu+0x19/0x28
 [<ffffffff800898e6>] find_busiest_group+0x20d/0x621
 [<ffffffff884fb795>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
 [<ffffffff88503eea>] :ptlrpc:ptlrpc_server_request_get+0x6a/0x150
 [<ffffffff88505d6d>] :ptlrpc:ptlrpc_check_req+0x1d/0x110
 [<ffffffff885082f3>] :ptlrpc:ptlrpc_server_handle_request+0xa93/0x1150
 [<ffffffff8006da18>] do_gettimeoffset_hpet+0x7/0x28
 [<ffffffff8006d8a2>] do_gettimeofday+0x40/0x8f
 [<ffffffff882c37c6>] :libcfs:lcw_update_time+0x16/0x100
 [<ffffffff8003d59a>] lock_timer_base+0x1b/0x3c
 [<ffffffff8001c5e4>] __mod_timer+0xb0/0xbe
 [<ffffffff8850b7e8>] :ptlrpc:ptlrpc_main+0x1218/0x13e0
 [<ffffffff8008abbc>] default_wake_function+0x0/0xe
 [<ffffffff800b4391>] audit_syscall_exit+0x31b/0x336
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8850a5d0>] :ptlrpc:ptlrpc_main+0x0/0x13e0
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

Writing from ANY client (e.g.: touch ./filename) to the lustre mount then returns "no space left on device."  Note above that we are not over 30% capacity on any OST and 4/7% capacity on mgs/mdt.

Any suggestions?  We have another cluster with an entirely different I/O load with 1.6.7.2. servers & clients that show no problems at all.

Thank you,

Robert





The information contained in this message and its attachments 
is intended only for the private and confidential use of the 
intended recipient(s).  If you are not the intended recipient 
(or have received this e-mail in error) please notify the 
sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-
mail is strictly prohibited.



More information about the lustre-discuss mailing list