[Lustre-discuss] freezing

Papp Tamas tompos at martos.bme.hu
Thu Dec 6 00:30:35 PST 2007


hi!

There is a cluster with lustre 1.6.0.1, kernel 2.6.9-42.0.10, drbd 0.7.22, CentOS 4.4 .

Last night it freezed up totally.

First of all frreezed up node3 (OST0003). I could telnet to port 22,
but thats all (it was open, but unusable).

This is the messages log:

Dec  5 11:23:34 node3 heartbeat: [3166]: info: These are nothing to worry about.
Dec  5 22:33:24 node3 syslogd 1.4.1: restart.

You can see, there is absolutely nothing for whole afternoon in logs,
I mean it's too few, maybe could happen something to it? What?


So node3 got a reboot, and everything was OK, but our meta szerver
(meta1 host) got to be freezed.
At this time I see in the message log:

Dec  5 20:58:31 meta1 kernel: LustreError: dumping log to /tmp/lustre-log.1196884711.5006
Dec  5 20:58:33 meta1 kernel: LustreError: dumping log to /tmp/lustre-log.1196884713.5000
Dec  5 20:58:33 meta1 kernel: LustreError: dumping log to /tmp/lustre-log.1196884713.5000
Dec  5 20:58:33 meta1 kernel: LustreError: can't open /tmp/lustre-log.1196884713.5000 file: err -17
Dec  5 20:58:33 meta1 kernel: LustreError: can't open /tmp/lustre-log.1196884713.5000 for dump: rc -17
Dec  5 20:58:33 meta1 kernel: LustreError: dumping log to /tmp/lustre-log.1196884713.5002
Dec  5 20:58:33 meta1 kernel: LustreError: dumping log to /tmp/lustre-log.1196884713.5003
Dec  5 20:58:35 meta1 kernel: LustreError: dumping log to /tmp/lustre-log.1196884715.5108
Dec  5 20:58:52 meta1 kernel: LustreError: dumping log to /tmp/lustre-log.1196884732.4998

If you want, I can send these log on private way.

Today morning I see on meta1's console an oops message, so I rebooted
the whole cluster.

Now everything seems to be working, except some oops in meta1's
messages log like this:

Dec  6 07:38:17 meta1 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 5160: it was inactive for 100s
Dec  6 07:38:17 meta1 kernel: Lustre: 0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process 5160
Dec  6 07:38:17 meta1 kernel: ll_mdt_07     S F48AAF20  1764  5160      1          5161  5159 (L-TLB)
Dec  6 07:38:17 meta1 kernel: ea2ca8ec 00000046 f48aaf20 f48aaf20 f48aaf20 00000c56 00000000 f8b3525c
Dec  6 07:38:17 meta1 kernel:        0000002a f28063aa f48aaf20 c201dde0 00000000 00000000 7d821380 000f42cf
Dec  6 07:38:17 meta1 kernel:        c0328a80 f2f31630 f2f3179c 00000000 00000246 00096b46 000003e8 ffffffff
Dec  6 07:38:17 meta1 kernel: Call Trace:
Dec  6 07:38:17 meta1 kernel:  [<f8b3525c>] libcfs_debug_vmsg2+0x35d/0x51b [libcfs]
Dec  6 07:38:17 meta1 kernel:  [<c02d51f6>] schedule_timeout+0x137/0x154
Dec  6 07:38:17 meta1 kernel:  [<c012a7ba>] process_timeout+0x0/0x5
Dec  6 07:38:17 meta1 kernel:  [<f8d1157a>] ptlrpc_set_wait+0x3b4/0x5f2 [ptlrpc]
Dec  6 07:38:17 meta1 kernel:  [<c011e7f5>] default_wake_function+0x0/0xc
Dec  6 07:38:17 meta1 kernel:  [<f9221ca9>] lov_statfs_async+0x1a9/0x348 [lov]
Dec  6 07:38:18 meta1 kernel:  [<f8d10db1>] ptlrpc_expired_set+0x0/0x19f [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8d10f6b>] ptlrpc_interrupted_set+0x0/0xd0 [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f9215c52>] obd_statfs_async+0x3bb/0x566 [lov]
Dec  6 07:38:18 meta1 kernel:  [<f8d10db1>] ptlrpc_expired_set+0x0/0x19f [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8d10f6b>] ptlrpc_interrupted_set+0x0/0xd0 [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f9215641>] lov_create+0x51a/0x770 [lov]
Dec  6 07:38:18 meta1 kernel:  [<c02d4fd6>] __cond_resched+0x14/0x39
Dec  6 07:38:18 meta1 kernel:  [<c015dcb2>] __getblk+0x2b/0x49
Dec  6 07:38:18 meta1 kernel:  [<f91b66ca>] ldiskfs_get_inode_loc+0x4f/0x226 [ldiskfs]
Dec  6 07:38:18 meta1 kernel:  [<f8b0ca66>] drbd_make_request_common+0x6ca/0x6d4 [drbd]
Dec  6 07:38:18 meta1 kernel:  [<f91bfcd1>] ldiskfs_xattr_ibody_get+0x14a/0x1a0 [ldiskfs]
Dec  6 07:38:18 meta1 kernel:  [<f91bfd83>] ldiskfs_xattr_get+0x5c/0x76 [ldiskfs]
Dec  6 07:38:18 meta1 kernel:  [<f8ff14e7>] fsfilt_ldiskfs_get_md+0x49/0x11a [fsfilt_ldiskfs]
Dec  6 07:38:18 meta1 kernel:  [<f8fbc4a5>] obd_create+0x3f2/0x485 [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8fba3e0>] mds_create_objects+0x16b6/0x2544 [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8fbda0d>] mds_finish_open+0x3db/0x99f [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8fc145f>] mds_open+0x2a3f/0x3000 [mds]
Dec  6 07:38:18 meta1 kernel:  [<c0107ac4>] do_IRQ+0x1a2/0x1ae
Dec  6 07:38:18 meta1 kernel:  [<f8b4c7d0>] entry_set_group_info+0x13b/0x375 [lvfs]
Dec  6 07:38:18 meta1 kernel:  [<f8b492f0>] push_ctxt+0x214/0x23c [lvfs]
Dec  6 07:38:18 meta1 kernel:  [<f8faf37a>] mds_reint_rec+0x1b7/0x26c [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8d1eda2>] lustre_msg_string+0x7e/0x353 [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8fc567f>] mds_open_unpack+0x39b/0x432 [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8f91a52>] mds_reint+0x3ed/0x4c9 [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8f9c52f>] mds_intent_policy+0x504/0xcb7 [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8f9c02b>] mds_intent_policy+0x0/0xcb7 [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8ce48b5>] ldlm_lock_enqueue+0x109/0x691 [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8d02eb5>] ldlm_handle_enqueue+0x10e0/0x1868 [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8d01492>] ldlm_server_completion_ast+0x0/0x527 [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8d00c44>] ldlm_server_blocking_ast+0x0/0x84e [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8f95f9c>] mds_handle+0x2a1d/0x3ad6 [mds]
Dec  6 07:38:18 meta1 kernel:  [<f8d27389>] ptlrpc_server_handle_request+0xb76/0x136f [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8d28acc>] ptlrpc_main+0x7ee/0x9b5 [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<c011e7f5>] default_wake_function+0x0/0xc
Dec  6 07:38:18 meta1 kernel:  [<f8d282d1>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<c02d693e>] ret_from_fork+0x6/0x14
Dec  6 07:38:18 meta1 kernel:  [<f8d282d1>] ptlrpc_retry_rqbds+0x0/0xd [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<f8d282de>] ptlrpc_main+0x0/0x9b5 [ptlrpc]
Dec  6 07:38:18 meta1 kernel:  [<c01041f5>] kernel_thread_helper+0x5/0xb
Dec  6 07:38:18 meta1 kernel: LustreError: dumping log to /tmp/lustre-log.1196923097.5160


What could I do with this?

Thank you very much,

tamas




More information about the lustre-discuss mailing list