[Lustre-discuss] MDS Crash Urgent Help Need

Mahmoud Hanafi trek5200trek at yahoo.com
Fri Feb 6 18:12:49 PST 2009


We had an mds crash and a subsequent reboot results in a panic. Any help would be greatly appreciated.

This error appears to be the key event.

Feb  6 13:51:58 service100 kernel: LustreError: 6976:0:(llog_obd.c:211:llog_add()) No ctxt

Thank,
Mahmoud Hanafi

Feb  6 13:39:14 service100 kernel: Lustre: m45_nb1-MDT0000: recovery complete: rc 0
Feb  6 13:39:15 service100 kernel: LustreError: 6597:0:(llog_obd.c:211:llog_add()) No ctxt
Feb  6 13:39:15 service100 kernel: LustreError: 6597:0:(llog_obd.c:211:llog_add()) Skipped 909 previous similar messages
Feb  6 13:39:15 service100 kernel: Lustre: MDS m45_nb1-MDT0000: m45_nb1-OST0000_UUID now active, resetting orphans
Feb  6 13:39:15 service100 kernel: Lustre: MDS m45_nb1-MDT0000: m45_nb1-OST0001_UUID now active, resetting orphans
Feb  6 13:39:15 service100 kernel: LustreError: 6496:0:(llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile 0x11b80054:0x10703925: rc -2
Feb  6 13:39:15 service100 kernel: LustreError: 6496:0:(llog_cat.c:176:llog_cat_id2handle()) error opening log id 0x11b80054:10703925: rc -2
Feb  6 13:39:15 service100 kernel: LustreError: 6496:0:(llog_cat.c:330:llog_cat_cancel_records()) Cannot find log 0x11b80054
Feb  6 13:39:15 service100 kernel: LustreError: 6497:0:(llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile 0x11b8004f:0x10703922: rc -2
Feb  6 13:39:15 service100 kernel: LustreError: 6497:0:(llog_cat.c:176:llog_cat_id2handle()) error opening log id 0x11b8004f:10703922: rc -2
Feb  6 13:39:15 service100 kernel: LustreError: 6497:0:(llog_cat.c:330:llog_cat_cancel_records()) Cannot find log 0x11b8004f
Feb  6 13:39:15 service100 kernel: LustreError: 6497:0:(llog_server.c:447:llog_origin_handle_cancel()) cancel 124 llog-records failed: -22
Feb  6 13:39:15 service100 kernel: LustreError: 6496:0:(llog_server.c:447:llog_origin_handle_cancel()) cancel 124 llog-records failed: -22
Feb  6 13:39:15 service100 kernel: Lustre: MDS m45_nb1-MDT0000: m45_nb1-OST0007_UUID now active, resetting orphans
Feb  6 13:39:15 service100 kernel: Lustre: Skipped 5 previous similar messages
Feb  6 13:39:16 service100 kernel: LustreError: 6497:0:(llog_lvfs.c:612:llog_lvfs_create()) error looking up logfile 0x11b80052:0x1070392a: rc -2
Feb  6 13:39:16 service100 kernel: LustreError: 6497:0:(llog_lvfs.c:612:llog_lvfs_create()) Skipped 6 previous similar messages
Feb  6 13:39:16 service100 kernel: LustreError: 6497:0:(llog_cat.c:176:llog_cat_id2handle()) error opening log id 0x11b80052:1070392a: rc -2
Feb  6 13:39:16 service100 kernel: LustreError: 6497:0:(llog_cat.c:176:llog_cat_id2handle()) Skipped 6 previous similar messages
Feb  6 13:39:16 service100 kernel: LustreError: 6497:0:(llog_cat.c:330:llog_cat_cancel_records()) Cannot find log 0x11b80052
Feb  6 13:39:16 service100 kernel: LustreError: 6497:0:(llog_cat.c:330:llog_cat_cancel_records()) Skipped 6 previous similar messages
Feb  6 13:39:16 service100 kernel: LustreError: 6499:0:(llog_server.c:447:llog_origin_handle_cancel()) cancel 124 llog-records failed: -22


Feb  6 13:51:51 service100 kernel: LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended
Feb  6 13:51:51 service100 kernel: LDISKFS FS on sde1, internal journal
Feb  6 13:51:51 service100 kernel: LDISKFS-fs: recovery complete.
Feb  6 13:51:51 service100 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
Feb  6 13:51:51 service100 kernel: kjournald starting.  Commit interval 5 seconds
Feb  6 13:51:51 service100 kernel: LDISKFS-fs warning: maximal mount count reached, running e2fsck is recommended
Feb  6 13:51:51 service100 kernel: LDISKFS FS on sde1, internal journal
Feb  6 13:51:51 service100 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
Feb  6 13:51:51 service100 kernel: Lustre: Added LNI 10.151.25.163 at o2ib [8/64] 
Feb  6 13:51:51 service100 kernel: LustreError: 137-5: UUID 'MGS' is not available  for connect (not set up)
Feb  6 13:51:51 service100 kernel: LustreError: 6798:0:(mgs_handler.c:647:mgs_handle()) MGS handle cmd=250 rc=-19
Feb  6 13:51:51 service100 kernel: LustreError: 6798:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-19)  req at ffff8107fb3c3050 x4876961/t0 o250-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 1233957211 ref 1 fl Interpret:/0/0 rc -19/0
Feb  6 13:51:51 service100 kernel: Lustre: MGS MGS started
Feb  6 13:51:51 service100 kernel: Lustre: Server MGS on device /dev/sde1 has started
Feb  6 13:51:56 service100 kernel: (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, recovered transactions 2765219 to 2765244
Feb  6 13:51:56 service100 kernel: (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 17611 and revoked 0/15 blocks
Feb  6 13:51:56 service100 kernel: kjournald starting.  Commit interval 5 seconds
Feb  6 13:51:57 service100 kernel: LDISKFS FS on sde2, internal journal 
Feb  6 13:51:57 service100 kernel: LDISKFS-fs: recovery complete.
Feb  6 13:51:57 service100 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
Feb  6 13:51:57 service100 kernel: kjournald starting.  Commit interval 5 seconds
Feb  6 13:51:57 service100 kernel: LDISKFS FS on sde2, internal journal 
Feb  6 13:51:57 service100 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
Feb  6 13:51:57 service100 kernel: LustreError: 137-5: UUID 'm45_nb1-MDT0000_UUID' is not available  for connect (no target)
Feb  6 13:51:57 service100 kernel: LustreError: 6854:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-19)  req at ffff8107db2d5400 x6027981/t0 o38-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 1233957217 ref 1 fl Interpret:/0/0 rc -19/0
Feb  6 13:51:58 service100 kernel: Lustre: Enabling user_xattr
Feb  6 13:51:58 service100 kernel: Lustre: Enabling ACL
Feb  6 13:51:58 service100 kernel: Lustre: 6923:0:(mds_fs.c:493:mds_init_server_data()) RECOVERY: service m45_nb1-MDT0000, 5893 recoverable clients, last_transno 5429096891
Feb  6 13:51:58 service100 kernel: Lustre: 6923:0:(mds_lov.c:1070:mds_notify()) MDS m45_nb1-MDT0000: in recovery, not resetting orphans on m45_nb1-OST0000_UUID
Feb  6 13:51:58 service100 kernel: LustreError: 6923:0:(obd_class.h:339:obd_get_info()) obd_get_info: NULL export
Feb  6 13:51:58 service100 kernel: LustreError: 6923:0:(lov_obd.c:455:lov_connect()) m45_nb1-mdtlov error sending notify -19
Feb  6 13:51:58 service100 kernel: Lustre: 6923:0:(mds_lov.c:1070:mds_notify()) MDS m45_nb1-MDT0000: in recovery, not resetting orphans on m45_nb1-OST0003_UUID
Feb  6 13:51:58 service100 kernel: Lustre: 6923:0:(mds_lov.c:1070:mds_notify()) Skipped 2 previous similar messages
Feb  6 13:51:58 service100 kernel: LustreError: 6923:0:(obd_class.h:339:obd_get_info()) obd_get_info: NULL export
Feb  6 13:51:58 service100 kernel: LustreError: 6923:0:(obd_class.h:339:obd_get_info()) Skipped 2 previous similar messages
Feb  6 13:51:58 service100 kernel: LustreError: 6923:0:(lov_obd.c:455:lov_connect()) m45_nb1-mdtlov error sending notify -19
Feb  6 13:51:58 service100 kernel: LustreError: 6923:0:(lov_obd.c:455:lov_connect()) Skipped 2 previous similar messages
Feb  6 13:51:58 service100 kernel: Lustre: MDT m45_nb1-MDT0000 now serving dev (m45_nb1-MDT0000/c528a9db-4b84-a59c-41b6-ad3a6ec11fbf), but will be in recovery for at least 5:00, or until 5893 clients reconnect. During this time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/mds/m45_nb1-MDT0000/recovery_status.
Feb  6 13:51:58 service100 kernel: Lustre: 6923:0:(lproc_mds.c:273:lprocfs_wr_group_upcall()) m45_nb1-MDT0000: group upcall set to NONE
Feb  6 13:51:58 service100 kernel: Lustre: m45_nb1-MDT0000.mdt: set parameter group_upcall=NONE
Feb  6 13:51:58 service100 kernel: Lustre: m45_nb1-MDT0000: temporarily refusing client connection from 10.151.9.169 at o2ib
Feb  6 13:51:58 service100 kernel: Lustre: m45_nb1-MDT0000: temporarily refusing client connection from 10.151.6.241 at o2ib
Feb  6 13:51:58 service100 kernel: Lustre: m45_nb1-MDT0000.mdt: set parameter quota_type=u2
Feb  6 13:51:58 service100 kernel: Lustre: 6861:0:(ldlm_lib.c:1226:check_and_start_recovery_timer()) m45_nb1-MDT0000: starting recovery timer
Feb  6 13:51:58 service100 kernel: Lustre: 6882:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 5892 recoverable clients remain
Feb  6 13:51:58 service100 kernel: Lustre: 6868:0:(mds_open.c:835:mds_open_by_fid()) Orphan 53f26ea:0f8c9a49 found and opened in PENDING directory
Feb  6 13:51:58 service100 kernel: Lustre: 6870:0:(mds_open.c:835:mds_open_by_fid()) Orphan 5482886:0fa65d97 found and opened in PENDING directory
Feb  6 13:51:58 service100 kernel: Lustre: 6869:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 5891 recoverable clients remain
Feb  6 13:51:58 service100 kernel: Lustre: 7002:0:(mds_open.c:835:mds_open_by_fid()) Orphan 54820e2:0fa5e637 found and opened in PENDING directory
Feb  6 13:51:58 service100 kernel: Lustre: 7002:0:(mds_open.c:835:mds_open_by_fid()) Skipped 137 previous similar messages
Feb  6 13:51:58 service100 kernel: LustreError: 6976:0:(llog_obd.c:211:llog_add()) No ctxt
Feb  6 13:51:58 service100 kernel: Lustre: 6885:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 5861 recoverable clients remain
Feb  6 13:51:58 service100 kernel: Lustre: 6885:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 29 previous similar messages
Feb  6 13:51:59 service100 kernel: Lustre: 6875:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 5565 recoverable clients remain
Feb  6 13:51:59 service100 kernel: Lustre: 6875:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 295 previous similar messages
Feb  6 13:51:59 service100 kernel: Lustre: 6974:0:(mds_open.c:835:mds_open_by_fid()) Orphan 530890c:0fad2c72 found and opened in PENDING directory
Feb  6 13:51:59 service100 kernel: Lustre: 6974:0:(mds_open.c:835:mds_open_by_fid()) Skipped 713 previous similar messages
Feb  6 13:52:01 service100 kernel: Lustre: 6881:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 4755 recoverable clients remain
Feb  6 13:52:01 service100 kernel: Lustre: 6881:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 809 previous similar messages
Feb  6 13:52:01 service100 kernel: Lustre: 6866:0:(mds_open.c:835:mds_open_by_fid()) Orphan 54c028f:0fad585d found and opened in PENDING directory
Feb  6 13:52:01 service100 kernel: Lustre: 6866:0:(mds_open.c:835:mds_open_by_fid()) Skipped 1691 previous similar messages
Feb  6 13:52:05 service100 kernel: Lustre: 6865:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 3930 recoverable clients remain
Feb  6 13:52:05 service100 kernel: Lustre: 6865:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 824 previous similar messages
Feb  6 13:52:05 service100 kernel: Lustre: 6968:0:(mds_open.c:835:mds_open_by_fid()) Orphan 54cd214:0fabedc6 found and opened in PENDING directory
Feb  6 13:52:05 service100 kernel: Lustre: 6968:0:(mds_open.c:835:mds_open_by_fid()) Skipped 2113 previous similar messages
Feb  6 13:52:13 service100 kernel: Lustre: 6879:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 2153 recoverable clients remain
Feb  6 13:52:13 service100 kernel: Lustre: 6879:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 1775 previous similar messages
Feb  6 13:52:13 service100 kernel: Lustre: 6872:0:(mds_open.c:835:mds_open_by_fid()) Orphan 52f9aea:0f799bf6 found and opened in PENDING directory
Feb  6 13:52:13 service100 kernel: Lustre: 6872:0:(mds_open.c:835:mds_open_by_fid()) Skipped 3299 previous similar messages
Feb  6 13:53:31 service100 kernel: Lustre: 7002:0:(mds_open.c:835:mds_open_by_fid()) Orphan 52f9af7:0f7b104d found and opened in PENDING directory
Feb  6 13:53:31 service100 kernel: Lustre: 7002:0:(mds_open.c:835:mds_open_by_fid()) Skipped 1232 previous similar messages
Feb  6 13:53:31 service100 kernel: Lustre: 6983:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 1295 recoverable clients remain
Feb  6 13:53:31 service100 kernel: Lustre: 6983:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 856 previous similar messages
Feb  6 13:53:38 service100 kernel: Lustre: 7006:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 reconnecting
Feb  6 13:53:38 service100 kernel: Lustre: 7006:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse reconnection from 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at 10.151.81.216@o2ib to 0xffff8107cd932000; still busy with 2 active RPCs
Feb  6 13:53:38 service100 kernel: LustreError: 7006:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16)  req at ffff810725625800 x5375830/t0 o38->9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at NET_0x500000a9751d8_UUID:0/0 lens 304/200 e 0 to 0 dl 1233957318 ref 1 fl Interpret:/0/0 rc -16/0
Feb  6 13:53:38 service100 kernel: LustreError: 7006:0:(ldlm_lib.c:1619:target_send_reply_msg()) Skipped 38 previous similar messages
Feb  6 13:53:38 service100 kernel: Lustre: 6971:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: a432bfb8-6afb-cc67-d49e-8e1ba23de270 reconnecting
Feb  6 13:53:38 service100 kernel: LustreError: 6982:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent queued req  req at ffff81072508f400 x5066410/t0 o101->a432bfb8-6afb-cc67-d49e-8e1ba23de270 at NET_0x500000a97482f_UUID:0/0 lens 512/0 e 0 to 0 dl 1233957318 ref 1 fl Interpret:/6/0 rc 0/0
Feb  6 13:53:39 service100 kernel: Lustre: 6986:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: bacd25ef-2f62-e88e-b080-d129171a0666 reconnecting
Feb  6 13:53:39 service100 kernel: LustreError: 7008:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent queued req  req at ffff8107257f1000 x32603191/t0 o36->bacd25ef-2f62-e88e-b080-d129171a0666 at NET_0x500000a97055a_UUID:0/0 lens 336/0 e 0 to 0 dl 1233957319 ref 1 fl Interpret:/6/0 rc 0/0
Feb  6 13:53:41 service100 kernel: Lustre: 6861:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: 9e70be4b-f534-5f36-39ca-cbd3f398981f reconnecting
Feb  6 13:53:41 service100 kernel: LustreError: 6919:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent queued req  req at ffff81072562ca00 x5621783/t0 o35->9e70be4b-f534-5f36-39ca-cbd3f398981f at NET_0x500000a97131b_UUID:0/0 lens 296/0 e 0 to 0 dl 1233957321 ref 1 fl Interpret:/6/0 rc 0/0
Feb  6 13:53:43 service100 kernel: Lustre: 6869:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: 668fb888-f573-8a5d-656d-f0f6943b261d reconnecting
Feb  6 13:53:43 service100 kernel: LustreError: 6994:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent queued req  req at ffff810725028600 x5087532/t0 o36->668fb888-f573-8a5d-656d-f0f6943b261d at NET_0x500000a970477_UUID:0/0 lens 360/0 e 0 to 0 dl 1233957323 ref 1 fl Interpret:/6/0 rc 0/0
Feb  6 13:53:48 service100 kernel: Lustre: 6881:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: 37479c6c-952d-1e5b-f28b-08a886b21994 reconnecting
Feb  6 13:53:48 service100 kernel: LustreError: 6918:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent queued req  req at ffff81072504ca00 x2467781/t0 o35->37479c6c-952d-1e5b-f28b-08a886b21994 at NET_0x500000a970bba_UUID:0/0 lens 296/0 e 0 to 0 dl 1233957328 ref 1 fl Interpret:/6/0 rc 0/0
Feb  6 13:53:53 service100 kernel: LustreError: 6859:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent queued req  req at ffff81072504ca00 x5025647/t0 o101->d5ff86c7-b54a-57cf-1948-928fac
Feb  6 13:54:02 service100 kernel: LustreError: 6965:0:(llog_obd.c:211:llog_add()) No ctxt
Feb  6 13:54:28 service100 kernel: Lustre: 6870:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 reconnecting
Feb  6 13:54:28 service100 kernel: Lustre: 6870:0:(ldlm_lib.c:538:target_handle_reconnect()) Skipped 2 previous similar messages
Feb  6 13:54:28 service100 kernel: Lustre: 6870:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse reconnection from 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at 10.151.81.216@o2ib to 0xffff8107cd932000; still busy with 2 active RPCs
Feb  6 13:54:28 service100 kernel: LustreError: 6870:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16)  req at ffff810724675a00 x5375927/t0 o38->9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at NET_0x500000a9751d8_UUID:0/0 lens 304/200 e 0 to 0 dl 1233957368 ref 1 fl Interpret:/0/0 rc -16/0
Feb  6 13:54:53 service100 kernel: Lustre: 6994:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 reconnecting
Feb  6 13:54:53 service100 kernel: Lustre: 6994:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse reconnection from 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at 10.151.81.216@o2ib to 0xffff8107cd932000; still busy with 2 active RPCs
Feb  6 13:54:53 service100 kernel: LustreError: 6994:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16)  req at ffff81072458fe00 x5376006/t0 o38->9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at NET_0x500000a9751d8_UUID:0/0 lens 304/200 e 0 to 0 dl 1233957393 ref 1 fl Interpret:/0/0 rc -16/0
Feb  6 13:55:18 service100 kernel: Lustre: 6968:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse reconnection from 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at 10.151.81.216@o2ib to 0xffff8107cd932000; still busy with 2 active RPCs
Feb  6 13:55:18 service100 kernel: LustreError: 6968:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16)  req at ffff8107247fee00 x5376085/t0 o38->9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at NET_0x500000a9751d8_UUID:0/0 lens 304/200 e 0 to 0 dl 1233957418 ref 1 fl Interpret:/0/0 rc -16/0
Feb  6 13:55:18 service100 kernel: Lustre: 0:0:(watchdog.c:148:lcw_cb()) Watchdog triggered for pid 6965: it was inactive for 200s
Feb  6 13:55:18 service100 kernel: Lustre: 0:0:(linux-debug.c:185:libcfs_debug_dumpstack()) showing stack for process 6965
Feb  6 13:55:18 service100 kernel: ll_mdt_33     S ffffffffffffffff     0  6965      1          6966  6964 (L-TLB)
Feb  6 13:55:18 service100 kernel: ffff8107cabf3b28 0000000000000046 0000000000001705 000000000000000a
Feb  6 13:55:18 service100 kernel:        ffff8108134f8a48 ffff8108134f87f0 ffff810009059800 0000005b41b8f6c4
Feb  6 13:55:18 service100 kernel:        0000000000001735 0000000300000000
Feb  6 13:55:18 service100 kernel: Call Trace: <ffffffff885fe428>{:ptlrpc:target_queue_recovery_request+2792}
Feb  6 13:55:18 service100 kernel:        <ffffffff8012c8a9>{default_wake_function+0} <ffffffff8873ad91>{:mds:mds_handle+2273}
Feb  6 13:55:18 service100 kernel:        <ffffffff8833aa71>{:lnet:lnet_match_blocked_msg+961}
Feb  6 13:55:18 service100 kernel:        <ffffffff80305642>{thread_return+0} <ffffffff88393995>{:obdclass:class_handle2object+213}
Feb  6 13:55:18 service100 kernel:        <ffffffff8862e765>{:ptlrpc:lustre_msg_get_conn_cnt+53}
Feb  6 13:55:18 service100 kernel:        <ffffffff8012bac9>{find_busiest_group+360} <ffffffff8863860a>{:ptlrpc:ptlrpc_check_req+26}
Feb  6 13:55:18 service100 kernel:        <ffffffff8863a867>{:ptlrpc:ptlrpc_server_handle_request+2503}
Feb  6 13:55:18 service100 kernel:        <ffffffff8010f239>{do_gettimeofday+92} <ffffffff882fa3d6>{:libcfs:lcw_update_time+38}
Feb  6 13:55:19 service100 kernel:        <ffffffff8013d49d>{__mod_timer+173} <ffffffff8863d9d1>{:ptlrpc:ptlrpc_main+3745}
Feb  6 13:55:19 service100 kernel:        <ffffffff8012c8a9>{default_wake_function+0} <ffffffff8010bfc2>{child_rip+8}
Feb  6 13:55:19 service100 kernel:        <ffffffff8863cb30>{:ptlrpc:ptlrpc_main+0} <ffffffff8010bfba>{child_rip+0}
Feb  6 13:55:19 service100 kernel: LustreError: dumping log to /tmp/lustre-log.1233957318.6965
Feb  6 13:55:21 service100 kernel: LustreError: 6919:0:(ldlm_lib.c:1434:target_queue_recovery_request()) @@@ dropping resent queued req  req at ffff810724035200 x5621783/t0 o35->9e70be4b-f534-5f36-39ca-cbd3f398981f at NET_0x500000a97131b_UUID:0/0 lens 296/0 e 0 to 0 dl 1233957421 ref 1 fl Interpret:/6/0 rc 0/0
Feb  6 13:55:21 service100 kernel: LustreError: 6919:0:(ldlm_lib.c:1434:target_queue_recovery_request()) Skipped 1 previous similar message
Feb  6 13:55:43 service100 kernel: Lustre: 6861:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 reconnecting
Feb  6 13:55:43 service100 kernel: Lustre: 6861:0:(ldlm_lib.c:538:target_handle_reconnect()) Skipped 2 previous similar messages
Feb  6 13:55:43 service100 kernel: Lustre: 6861:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse reconnection from 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at 10.151.81.216@o2ib to 0xffff8107cd932000; still busy with 2 active RPCs
Feb  6 13:55:43 service100 kernel: LustreError: 6861:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16)  req at ffff8107246d9600 x5376164/t0 o38->9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at NET_0x500000a9751d8_UUID:0/0 lens 304/200 e 0 to 0 dl 1233957443 ref 1 fl Interpret:/0/0 rc -16/0
Feb  6 13:56:08 service100 kernel: Lustre: 7009:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse reconnection from 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at 10.151.81.216@o2ib to 0xffff8107cd932000; still busy with 2 active RPCs
Feb  6 13:56:08 service100 kernel: LustreError: 7009:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-16)  req at ffff8107247a0400 x5376243/t0 o38->9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at NET_0x500000a9751d8_UUID:0/0 lens 304/200 e 0 to 0 dl 1233957468 ref 1 fl Interpret:/0/0 rc -16/0
Feb  6 13:56:33 service100 kernel: Lustre: 6855:0:(ldlm_lib.c:773:target_handle_connect()) m45_nb1-MDT0000: refuse reconnection from 9236b2bf-92ee-fc8b-c7f2-e3563a377de0 at 10.151.81.216@o2ib to 0xffff8107cd932000; still busy with 2 active RPCs
Feb  6 13:56:58 service100 kernel: Lustre: 7008:0:(ldlm_lib.c:538:target_handle_reconnect()) m45_nb1-MDT0000: 6c3e80bd-92fb-8a7c-5bd9-72bc744956fc reconnecting
Feb  6 13:56:58 service100 kernel: Lustre: 7008:0:(ldlm_lib.c:538:target_handle_reconnect()) Skipped 2 previous similar messages
Feb  6 13:56:58 service100 kernel: Lustre: 6973:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) m45_nb1-MDT0000: 3 recoverable clients remain
Feb  6 13:56:58 service100 kernel: Lustre: 6973:0:(ldlm_lib.c:1567:target_queue_last_replay_reply()) Skipped 1292 previous similar messages
Feb  6 13:56:58 service100 kernel: Lustre: Parent 87005581/3805388507 lookup error -2. Evicting client 7a197206-3055-fbec-480a-93bdd6753834 with export 10.151.77.211 at o2ib.
Feb  6 13:56:58 service100 kernel: LustreError: 6983:0:(handler.c:1590:mds_handle()) operation 101 on unconnected MDS from 12345-10.151.77.211 at o2ib
Feb  6 13:56:58 service100 kernel: LustreError: 6983:0:(ldlm_lib.c:1619:target_send_reply_msg()) @@@ processing error (-107)  req at ffff8107243ece00 x5257230/t0 o101-><?>@<?>:0/0 lens 232/0 e 0 to 0 dl 1233957518 ref 1 fl Interpret:/0/0 rc -107/0
Feb  6 13:56:58 service100 kernel: LustreError: 6983:0:(ldlm_lib.c:1619:target_send_reply_msg()) Skipped 1 previous similar message




      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090206/1ccdd980/attachment.htm>


More information about the lustre-discuss mailing list