[Lustre-discuss] Lustre Issue

Nauman Yousuf nauman.yousuf at gmail.com
Mon Jan 31 23:30:27 PST 2011


here is logs with time stamp

Primary OSS

Jan 31 16:03:08 typhoon kernel: Lustre:
1689:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 32s
Jan 31 16:03:08 typhoon kernel: Lustre:
1689:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 32s
Jan 31 16:03:12 typhoon kernel: Lustre:
1712:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 36s
Jan 31 16:03:12 typhoon kernel: Lustre:
1712:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 36s
Jan 31 16:03:22 typhoon kernel: Lustre:
1695:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 46s
Jan 31 16:03:22 typhoon kernel: Lustre:
1695:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 46s
Jan 31 16:19:19 typhoon kernel: Lustre:
1677:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 30s
Jan 31 16:19:19 typhoon kernel: Lustre:
1677:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 30s
Jan 31 16:28:54 typhoon kernel: Lustre:
1712:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 46s
Jan 31 16:28:54 typhoon kernel: Lustre:
1712:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 46s
Jan 31 16:32:51 typhoon kernel: Lustre:
1470:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10971425 offset 0 length 64: 2
Jan 31 16:32:56 typhoon kernel: Lustre:
1470:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10971427 offset 0 length 64: 2
Jan 31 16:34:23 typhoon kernel: Lustre:
1707:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 31s
Jan 31 16:34:29 typhoon kernel: Lustre:
1692:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 31s
Jan 31 16:34:29 typhoon kernel: Lustre:
1692:0:(filter_io_26.c:727:filter_commitrw_write()) Skipped 1 previous
similar message
Jan 31 16:50:19 typhoon kernel: Lustre:
1674:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 34s
Jan 31 16:50:19 typhoon kernel: Lustre:
1674:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 34s
Jan 31 16:50:19 typhoon kernel: Lustre:
1674:0:(filter_io_26.c:727:filter_commitrw_write()) Skipped 5 previous
similar messages
Jan 31 16:58:38 typhoon kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Watchdog triggered for pid 1731: it was inactive for 100000ms
Jan 31 16:58:38 typhoon kernel: Lustre:
0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process
1731
Jan 31 16:58:38 typhoon kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Watchdog triggered for pid 1683: it was inactive for 100000ms
Jan 31 16:58:38 typhoon kernel: Lustre:
0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process
1683
Jan 31 16:59:04 typhoon kernel: Lustre:
1731:0:(watchdog.c:311:lcw_update_time()) Expired watchdog for pid 1731
disabled after 125597293.0001s
Jan 31 16:59:04 typhoon kernel: Lustre:
1683:0:(watchdog.c:311:lcw_update_time()) Expired watchdog for pid 1683
disabled after 125576812.0001s
Jan 31 17:04:39 typhoon kernel: Lustre:
1701:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 36s
Jan 31 17:04:39 typhoon kernel: Lustre:
1701:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw
commit 36s





secondat OSS

Jan 31 16:19:08 cyclops kernel: Lustre:
1703:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 34s
Jan 31 16:19:08 cyclops kernel: Lustre:
1703:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw
commit 34s
Jan 31 16:19:43 cyclops kernel: Lustre:
1700:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 30s
Jan 31 16:19:43 cyclops kernel: Lustre:
1700:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw
commit 30s
Jan 31 16:24:51 cyclops kernel: Lustre:
1706:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 30s
Jan 31 16:24:51 cyclops kernel: Lustre:
1706:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw
commit 30s
Jan 31 16:27:11 cyclops kernel: Lustre:
1691:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 30s
Jan 31 16:27:11 cyclops kernel: Lustre:
1691:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw
commit 30s
Jan 31 16:29:35 cyclops kernel: Lustre:
1728:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 40s
Jan 31 16:29:35 cyclops kernel: Lustre:
1728:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw
commit 40s
Jan 31 16:32:40 cyclops kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb())
Watchdog triggered for pid 1667: it was inactive for 100000ms
Jan 31 16:32:40 cyclops kernel: Lustre:
0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process
1667
Jan 31 16:32:44 cyclops kernel: Lustre:
1667:0:(watchdog.c:311:lcw_update_time()) Expired watchdog for pid 1667
disabled after 103346449.0001s
Jan 31 16:32:44 cyclops kernel: Lustre:
1667:0:(watchdog.c:311:lcw_update_time()) Skipped 1 previous similar message
Jan 31 16:33:05 cyclops kernel: Lustre:
1615:0:(ldlm_lib.c:486:target_handle_reconnect()) ost2:
7c2f0_lov1_34edb7f8f7 reconnecting
Jan 31 16:33:42 cyclops kernel: Lustre:
1718:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 36s
Jan 31 16:33:42 cyclops kernel: Lustre:
1718:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw
commit 36s
Jan 31 16:58:42 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726907 offset 0 length 112: 2
Jan 31 16:58:47 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726911 offset 0 length 64: 2
Jan 31 16:58:47 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726921 offset 0 length 64: 2
Jan 31 16:58:47 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Skipped 1 previous similar message
Jan 31 16:58:47 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726923 offset 0 length 64: 2
Jan 31 16:58:47 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726926 offset 0 length 64: 2
Jan 31 16:58:47 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726929 offset 0 length 64: 2
Jan 31 16:58:57 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726969 offset 0 length 64: 2
Jan 31 16:59:02 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726992 offset 0 length 64: 2
Jan 31 16:59:02 cyclops kernel: Lustre:
1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.37 at tcp portal 16 match 10726994 offset 0 length 64: 2


Client

Jan 31 16:32:40 hades kernel: Lustre: Changing connection for
OSC_hades.example.com_ost2_MNT_client to typhoon_UUID/10.65.200.21 at tcp
Jan 31 16:32:44 hades kernel: Lustre:
2118:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.30 at tcp portal 4 match 57154751 offset 0 length 288: 2
Jan 31 16:32:44 hades kernel: Lustre:
2118:0:(lib-move.c:1644:lnet_parse_put()) Skipped 1 previous similar message
Jan 31 16:33:05 hades kernel: Lustre: Changing connection for
OSC_hades.example.com_ost2_MNT_client to cyclops_UUID/10.65.200.30 at tcp
Jan 31 16:33:05 hades kernel: Lustre: OSC_hades.example.com_ost2_MNT_client:
Connection restored to service ost2 using nid 10.65.200.30 at tcp.
Jan 31 16:33:12 hades kernel: Lustre: Changing connection for
OSC_hades.example.com_ost2_MNT_client to typhoon_UUID/10.65.200.21 at tcp
Jan 31 16:33:55 hades kernel: Lustre: Changing connection for
OSC_hades.example.com_ost2_MNT_client to cyclops_UUID/10.65.200.30 at tcp
Jan 31 16:33:58 hades kernel: Lustre: OSC_hades.example.com_ost2_MNT_client:
Connection restored to service ost2 using nid 10.65.200.30 at tcp.
Jan 31 16:58:38 hades kernel: Lustre: Changing connection for
OSC_hades.example.com_ost1_MNT_client to cyclops_UUID/10.65.200.30 at tcp
Jan 31 16:59:03 hades kernel: Lustre: Changing connection for
OSC_hades.example.com_ost1_MNT_client to typhoon_UUID/10.65.200.21 at tcp
Jan 31 16:59:05 hades kernel: Lustre:
2121:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.21 at tcp portal 4 match 57400851 offset 0 length 280: 2
Jan 31 16:59:05 hades kernel: Lustre:
2121:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from
12345-10.65.200.21 at tcp portal 4 match 57396169 offset 0 length 280: 2
Jan 31 16:59:06 hades kernel: Lustre: OSC_hades.example.com_ost1_MNT_client:
Connection restored to service ost1 using nid 10.65.200.21 at tcp.
Jan 31 18:43:31 hades kernel: Lustre: Acceptor stopping
Jan 31 18:43:33 hades kernel: Lustre: Removed LNI 10.65.200.37 at tcp
Jan 31 18:44:37 hades kernel: Lustre:
1145:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192
Jan 31 18:44:38 hades kernel: Lustre: OBD class driver Build Version:
1.4.9.1-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_9_1-2.6-rhel4-i686.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.8.EL_lustre.1.4.9.1smp,
info at clusterfs.com
Jan 31 18:44:38 hades kernel: Lustre: Added LNI 10.65.200.37 at tcp [8/256]
Jan 31 18:44:38 hades kernel: Lustre: Accept secure, port 988
Jan 31 18:44:38 hades kernel: Lustre: Lustre Lite Client File System;
info at clusterfs.com
Jan 31 18:54:58 hades kernel: Lustre: Acceptor stopping
Jan 31 18:55:00 hades kernel: Lustre: Removed LNI 10.65.200.37 at tcp
Jan 31 19:00:25 hades kernel: Lustre:
2680:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192
Jan 31 19:00:25 hades kernel: Lustre: OBD class driver Build Version:
1.4.9.1-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_9_1-2.6-rhel4-i686.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.8.EL_lustre.1.4.9.1smp,
info at clusterfs.com
Jan 31 19:00:25 hades kernel: Lustre: Added LNI 10.65.200.37 at tcp [8/256]
Jan 31 19:00:25 hades kernel: Lustre: Accept secure, port 988
Jan 31 19:00:25 hades kernel: Lustre: Lustre Lite Client File System;
info at clusterfs.com


Plz suggest now.


On Mon, Jan 31, 2011 at 11:55 PM, Brian J. Murrell <brian at whamcloud.com>wrote:

> On Mon, 2011-01-31 at 16:22 +0500, Nauman Yousuf wrote:
> > hi
>
> Hi,
>
> > what thiserror means
>
> Same as I told you last time.  (a) it's difficult to gather context
> without timestamps and (b) lacking any further information it looks like
> your storage target is overloaded.
>
> Again, I would posit that this is likely because your storage target is
> rebuilding and you are expecting the same kind of performance in that
> degraded state that you get when the mirror is a non-degraded state.
>
> You need to tune your cluster to operate within the performance
> parameters of degraded storage targets.  The manual covers this kind of
> tuning.
>
> b.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>


-- 
Regards

Nauman Yousuf
0321-2549206
E-Eager, N-Noble, G-Genuine, I-Intelligent, N-Natural, E-Enthusiastic,
E-Energetic, R-Resourcefull --- ENGINEER
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110201/38036d3f/attachment.htm>


More information about the lustre-discuss mailing list