<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META content="MSHTML 6.00.2900.5626" name=GENERATOR></HEAD>
<BODY>
<DIV>Dear list, </DIV>
<DIV> </DIV>
<DIV> Our Lustre system crashes frequently these days
with heavy average load. </DIV>
<DIV> </DIV>
<DIV>1)#top</DIV>
<DIV> top - 14:32:57 up 18:15, 1 user, load average: 25.05,
24.27, 24.47<BR>Mem: 8307364k total, 859724k used,
7447640k free, 234288k buffers<BR>Swap: 16386292k
total, 0k used, 16386292k
free, 37932k cached</DIV>
<DIV> </DIV>
<DIV> PID USER PR NI VIRT
RES SHR S %CPU %MEM TIME+
COMMAND
<BR>26695 root 15
0 0 0 0 S
7.6 0.0 51:57.40
socknal_sd04
<BR>26694 root 15
0 0 0 0 S
6.6 0.0 53:44.42
socknal_sd03
<BR>26691 root 15
0 0 0 0 S
5.6 0.0 51:11.76
socknal_sd00
<BR>26697 root 15
0 0 0 0 S
5.3 0.0 42:12.23
socknal_sd06
<BR>26696 root 15
0 0 0 0 S
3.3 0.0 52:47.42
socknal_sd05
<BR>26692 root 15
0 0 0 0 S
2.3 0.0 26:19.46
socknal_sd01
<BR>26693 root 15
0 0 0 0 S
2.3 0.0 32:38.21
socknal_sd02
<BR>26952 root 15
0 0 0 0 S
1.0 0.0 2:06.69
ll_ost_io_09
<BR>....</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV>2) iostat -x 5 <BR>Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp
(boss01.ihep.ac.cn) 11/10/2008</DIV>
<DIV> </DIV>
<DIV>avg-cpu: %user %nice %sys
%iowait
%idle<BR>
0.00 0.00 11.33 4.56
84.10</DIV>
<DIV> </DIV>
<DIV>Device: rrqm/s wrqm/s r/s
w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm
%util<BR>cciss/c0d0 1.05 0.43 0.27
0.41 9.78 6.65
4.89 3.32
24.31 0.01 17.15
5.78
0.39<BR>sda
3.46 0.64 1297.05 0.60 2588.12 118.12
1294.06 59.06 2.09
22.81 15.69 0.77
99.57<BR>sdb
3.09 0.28 1274.46 0.18 1541.21 23.54
770.60 11.77 1.23
16.75 12.16 0.78 99.56</DIV>
<DIV> </DIV>
<DIV>avg-cpu: %user %nice %sys
%iowait
%idle<BR>
0.00 0.00 11.53 0.10
88.38</DIV>
<DIV> </DIV>
<DIV>Device: rrqm/s wrqm/s r/s
w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm
%util<BR>cciss/c0d0 0.00 1.80 0.00
0.00 0.00 16.00
0.00 8.00
0.00 0.00 0.00
0.00
0.00<BR>sda
3.20 0.00 1436.60 0.00 130524.80 0.00
65262.40 0.00 90.86
16.29 10.73 0.70
100.00<BR>sdb
3.40 0.00 1142.20 0.00 124113.60 0.00
62056.80 0.00 108.66
10.44 8.24 0.87 99.80<BR></DIV>
<DIV> </DIV>
<DIV>Before each crashes, there are LustreError like:</DIV>
<DIV> </DIV>
<DIV>Nov 9 17:25:41 boss01 kernel: LustreError:
27327:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after
100+0s <A href="mailto:req@e3df8e00">req@e3df8e00</A> x133017/t0
o3->73c15254-a884-578e-9634-859b44619a4f@NET_0x20000c0a83446_UUID:0/0 lens
400/336 e 0 to 0 dl 1226222741 ref 1 fl Interpret:/0/0 rc 0/0<BR>Nov 9
17:25:41 boss01 kernel: Lustre: 27327:0:(ost_handler.c:925:ost_brw_read())
besfs-OST0005: ignoring bulk IO comm error with <A
href="mailto:73c15254-a884-578e-9634-859b44619a4f@NET_0x20000c0a83446_UUID">73c15254-a884-578e-9634-859b44619a4f@NET_0x20000c0a83446_UUID</A>
id <A href="mailto:12345-192.168.52.70@tcp">12345-192.168.52.70@tcp</A> - client
will retry<BR>Nov 9 17:27:47 boss01 kernel: Lustre: besfs-OST0006: haven't
heard from client 73c15254-a884-578e-9634-859b44619a4f (at <A
href="mailto:192.168.52.70@tcp">192.168.52.70@tcp</A>) in 227 seconds. I think
it's dead, and I am evicting it.<BR>Nov 9 17:27:48 boss01 kernel: Lustre:
besfs-OST0007: haven't heard from client 73c15254-a884-578e-9634-859b44619a4f
(at <A href="mailto:192.168.52.70@tcp">192.168.52.70@tcp</A>) in 227 seconds. I
think it's dead, and I am evicting it.<BR>Nov 9 09:28:05 boss01
sshd[29314]: Connection closed by 192.168.51.130<BR>Nov 9 17:29:17 boss01
ntpd[27872]: kernel time sync enabled 0001<BR>Nov 9 17:56:48 boss01
kernel: Lustre: besfs-OST0005: haven't heard from client
c06ff22f-03a6-3897-ec32-1f26f6958e8b (at <A
href="mailto:202.122.33.83@tcp">202.122.33.83@tcp</A>) in 227 seconds. I think
it's dead, and I am evicting it.<BR>Nov 9 17:56:48 boss01 kernel: Lustre:
Skipped 2 previous similar messages<BR>Nov 9 17:59:15 boss01 kernel:
Lustre: besfs-OST0002: haven't heard from client
c06ff22f-03a6-3897-ec32-1f26f6958e8b (at <A
href="mailto:202.122.33.83@tcp">202.122.33.83@tcp</A>) in 374 seconds. I think
it's dead, and I am evicting it.<BR>Nov 9 17:59:18 boss01 kernel:
LustreError: 27250:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT
after 100+0s <A href="mailto:req@e2ccee00">req@e2ccee00</A> x36870/t0
o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a@NET_0x20000c0a83446_UUID:0/0 lens
400/336 e 0 to 0 dl 1226224758 ref 1 fl Interpret:/0/0 rc 0/0<BR>Nov 9
17:59:18 boss01 kernel: LustreError: 27250:0:(ost_handler.c:868:ost_brw_read())
Skipped 2 previous similar messages<BR>Nov 9 17:59:18 boss01 kernel:
Lustre: 27250:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0007: ignoring bulk
IO comm error with <A
href="mailto:7df31bbf-54a5-ada8-abd7-f0920f648d0a@NET_0x20000c0a83446_UUID">7df31bbf-54a5-ada8-abd7-f0920f648d0a@NET_0x20000c0a83446_UUID</A>
id <A href="mailto:12345-192.168.52.70@tcp">12345-192.168.52.70@tcp</A> - client
will retry<BR>Nov 9 17:59:18 boss01 kernel: Lustre:
27250:0:(ost_handler.c:925:ost_brw_read()) Skipped 2 previous similar
messages<BR>Nov 9 17:59:18 boss01 kernel: LustreError:
29507:0:(ost_handler.c:868:ost_brw_read()) @@@ timeout on bulk PUT after
100+0s <A href="mailto:req@e01bce00">req@e01bce00</A> x36866/t0
o3->7df31bbf-54a5-ada8-abd7-f0920f648d0a@NET_0x20000c0a83446_UUID:0/0 lens
432/336 e 0 to 0 dl 1226224758 ref 1 fl Interpret:/0/0 rc 0/0<BR>Nov 9
17:59:18 boss01 kernel: LustreError: 29507:0:(ost_handler.c:868:ost_brw_read())
Skipped 4 previous similar messages<BR>Nov 9 17:59:18 boss01 kernel:
Lustre: 29507:0:(ost_handler.c:925:ost_brw_read()) besfs-OST0005: ignoring bulk
IO comm error with <A
href="mailto:7df31bbf-54a5-ada8-abd7-f0920f648d0a@NET_0x20000c0a83446_UUID">7df31bbf-54a5-ada8-abd7-f0920f648d0a@NET_0x20000c0a83446_UUID</A>
id <A href="mailto:12345-192.168.52.70@tcp">12345-192.168.52.70@tcp</A> - client
will retry<BR>Nov 9 17:59:18 boss01 kernel: Lustre:
29507:0:(ost_handler.c:925:ost_brw_read()) Skipped 4 previous similar
messages<BR>Nov 9 18:01:33 boss01 kernel: Lustre: besfs-OST0007: haven't
heard from client c06ff22f-03a6-3897-ec32-1f26f6958e8b (at <A
href="mailto:202.122.33.83@tcp">202.122.33.83@tcp</A>) in 512 seconds. I think
it's dead, and I am evicting it.<BR>Nov 9 18:04:14 boss01 kernel: Lustre:
besfs-OST0007: haven't heard from client 7df31bbf-54a5-ada8-abd7-f0920f648d0a
(at <A href="mailto:192.168.52.70@tcp">192.168.52.70@tcp</A>) in 396 seconds. I
think it's dead, and I am evicting it.<BR></DIV>
<DIV>The configuration of our system</DIV>
<DIV>OS:Linux 2.6.9-67.0.7.EL_lustre.1.6.5smp</DIV>
<DIV>MDS:1</DIV>
<DIV>OSS:2 with 10Gbit/s NIC, each attached with 2 disk arrays directly. </DIV>
<DIV>Client: 50 nodes( 8 core server), each has 1Gbit/s NIC</DIV>
<DIV> </DIV>
<DIV>and </DIV>
<DIV> </DIV>
<DIV>[root@boss02 ~]# sysctl -q lnet<BR>lnet.nis =
nid
refs peer max tx min<BR>lnet.nis = <A
href="mailto:0@lo">0@lo</A>
2 0 0
0 0<BR>lnet.nis = <A
href="mailto:192.168.50.34@tcp">192.168.50.34@tcp</A>
136 8 256 250
88<BR>lnet.buffers = pages count credits
min<BR>lnet.buffers = 0
0 0
0<BR>lnet.buffers = 1
0 0
0<BR>lnet.buffers = 256
0 0
0<BR>lnet.peers =
nid
refs state max rtr min
tx min queue<BR>lnet.peers = <A
href="mailto:192.168.50.14@tcp">192.168.50.14@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.11@tcp">192.168.52.11@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.13@tcp">192.168.52.13@tcp</A>
1 ~rtr 8
8 8 8 -71
0<BR>lnet.peers = <A
href="mailto:192.168.52.14@tcp">192.168.52.14@tcp</A>
1 ~rtr 8
8 8 8 -8
0<BR>lnet.peers = <A
href="mailto:192.168.52.15@tcp">192.168.52.15@tcp</A>
1 ~rtr 8
8 8 8 -14
0<BR>lnet.peers = <A
href="mailto:192.168.52.16@tcp">192.168.52.16@tcp</A>
1 ~rtr 8
8 8 8 -30
0<BR>lnet.peers = <A
href="mailto:192.168.52.17@tcp">192.168.52.17@tcp</A>
1 ~rtr 8
8 8 8 -38
0<BR>lnet.peers = <A
href="mailto:192.168.52.18@tcp">192.168.52.18@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.19@tcp">192.168.52.19@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.20@tcp">192.168.52.20@tcp</A>
1 ~rtr 8
8 8 8 -19
0<BR>lnet.peers = <A
href="mailto:192.168.52.21@tcp">192.168.52.21@tcp</A>
1 ~rtr 8
8 8 8 3
0<BR>lnet.peers = <A
href="mailto:192.168.52.22@tcp">192.168.52.22@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.50.32@tcp">192.168.50.32@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.23@tcp">192.168.52.23@tcp</A>
1 ~rtr 8
8 8 8 -6
0<BR>lnet.peers = <A
href="mailto:192.168.52.24@tcp">192.168.52.24@tcp</A>
1 ~rtr 8
8 8 8 -50
0<BR>lnet.peers = <A
href="mailto:192.168.52.25@tcp">192.168.52.25@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.26@tcp">192.168.52.26@tcp</A>
1 ~rtr 8
8 8 8 -2
0<BR>lnet.peers = <A
href="mailto:192.168.52.27@tcp">192.168.52.27@tcp</A>
1 ~rtr 8
8 8 8 -31
0<BR>lnet.peers = <A
href="mailto:192.168.52.28@tcp">192.168.52.28@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.29@tcp">192.168.52.29@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.30@tcp">192.168.52.30@tcp</A>
1 ~rtr 8
8 8 8 -31
0<BR>lnet.peers = <A
href="mailto:192.168.52.31@tcp">192.168.52.31@tcp</A>
7 ~rtr 8
8 8 2 -10
3318192<BR>lnet.peers = <A
href="mailto:192.168.52.32@tcp">192.168.52.32@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.33@tcp">192.168.52.33@tcp</A>
1 ~rtr 8
8 8 8 -6
0<BR>lnet.peers = <A
href="mailto:192.168.52.34@tcp">192.168.52.34@tcp</A>
1 ~rtr 8
8 8 8 -4
0<BR>lnet.peers = <A
href="mailto:192.168.52.35@tcp">192.168.52.35@tcp</A>
1 ~rtr 8
8 8 8 -2
0<BR>lnet.peers = <A
href="mailto:192.168.52.36@tcp">192.168.52.36@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.37@tcp">192.168.52.37@tcp</A>
1 ~rtr 8
8 8 8 -55
0<BR>lnet.peers = <A
href="mailto:192.168.52.38@tcp">192.168.52.38@tcp</A>
1 ~rtr 8
8 8 8 -62
0<BR>lnet.peers = <A
href="mailto:192.168.52.39@tcp">192.168.52.39@tcp</A>
1 ~rtr 8
8 8 8 -8
0<BR>lnet.peers = <A
href="mailto:192.168.52.40@tcp">192.168.52.40@tcp</A>
1 ~rtr 8
8 8 8 -5
0<BR>lnet.peers = <A
href="mailto:192.168.52.41@tcp">192.168.52.41@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.42@tcp">192.168.52.42@tcp</A>
1 ~rtr 8
8 8 8 -4
0<BR>lnet.peers = <A
href="mailto:192.168.52.43@tcp">192.168.52.43@tcp</A>
1 ~rtr 8
8 8 8 -31
0<BR>lnet.peers = <A
href="mailto:192.168.52.44@tcp">192.168.52.44@tcp</A>
1 ~rtr 8
8 8 8 -14
0<BR>lnet.peers = <A
href="mailto:192.168.52.45@tcp">192.168.52.45@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.46@tcp">192.168.52.46@tcp</A>
1 ~rtr 8
8 8 8 -3
0<BR>lnet.peers = <A
href="mailto:192.168.52.47@tcp">192.168.52.47@tcp</A>
1 ~rtr 8
8 8 8 -10
0<BR>lnet.peers = <A
href="mailto:192.168.52.48@tcp">192.168.52.48@tcp</A>
1 ~rtr 8
8 8 8 -23
0<BR>lnet.peers = <A
href="mailto:192.168.52.49@tcp">192.168.52.49@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.50@tcp">192.168.52.50@tcp</A>
1 ~rtr 8
8 8 8 -3
0<BR>lnet.peers = <A
href="mailto:192.168.52.51@tcp">192.168.52.51@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.52@tcp">192.168.52.52@tcp</A>
1 ~rtr 8
8 8 8 -23
0<BR>lnet.peers = <A
href="mailto:192.168.52.53@tcp">192.168.52.53@tcp</A>
1 ~rtr 8
8 8 8 -5
0<BR>lnet.peers = <A
href="mailto:192.168.52.54@tcp">192.168.52.54@tcp</A>
1 ~rtr 8
8 8 8 -20
0<BR>lnet.peers = <A
href="mailto:192.168.52.55@tcp">192.168.52.55@tcp</A>
1 ~rtr 8
8 8 8 -5
0<BR>lnet.peers = <A
href="mailto:192.168.52.56@tcp">192.168.52.56@tcp</A>
1 ~rtr 8
8 8 8 1
0<BR>lnet.peers = <A
href="mailto:192.168.52.57@tcp">192.168.52.57@tcp</A>
1 ~rtr 8
8 8 8 1
0<BR>lnet.peers = <A
href="mailto:192.168.52.58@tcp">192.168.52.58@tcp</A>
1 ~rtr 8
8 8 8 -11
0<BR>lnet.peers = <A
href="mailto:192.168.52.59@tcp">192.168.52.59@tcp</A>
1 ~rtr 8
8 8 8 -4
0<BR>lnet.peers = <A
href="mailto:192.168.52.60@tcp">192.168.52.60@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.61@tcp">192.168.52.61@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.62@tcp">192.168.52.62@tcp</A>
1 ~rtr 8
8 8 8 -19
0<BR>lnet.peers = <A
href="mailto:192.168.52.63@tcp">192.168.52.63@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.64@tcp">192.168.52.64@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.65@tcp">192.168.52.65@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.66@tcp">192.168.52.66@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.67@tcp">192.168.52.67@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.68@tcp">192.168.52.68@tcp</A>
1 ~rtr 8
8 8 8 3
0<BR>lnet.peers = <A
href="mailto:192.168.52.69@tcp">192.168.52.69@tcp</A>
1 ~rtr 8
8 8 8 1
0<BR>lnet.peers = <A
href="mailto:192.168.52.70@tcp">192.168.52.70@tcp</A>
1 ~rtr 8
8 8 8 -8
0<BR>lnet.peers = <A
href="mailto:192.168.52.71@tcp">192.168.52.71@tcp</A>
1 ~rtr 8
8 8 8 -2
0<BR>lnet.peers = <A
href="mailto:192.168.52.72@tcp">192.168.52.72@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.73@tcp">192.168.52.73@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.74@tcp">192.168.52.74@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.75@tcp">192.168.52.75@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:202.122.33.56@tcp">202.122.33.56@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.76@tcp">192.168.52.76@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.77@tcp">192.168.52.77@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.78@tcp">192.168.52.78@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.79@tcp">192.168.52.79@tcp</A>
1 ~rtr 8
8 8 8 -3
0<BR>lnet.peers = <A
href="mailto:192.168.52.80@tcp">192.168.52.80@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.81@tcp">192.168.52.81@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.82@tcp">192.168.52.82@tcp</A>
1 ~rtr 8
8 8 8 3
0<BR>lnet.peers = <A
href="mailto:192.168.52.83@tcp">192.168.52.83@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.84@tcp">192.168.52.84@tcp</A>
1 ~rtr 8
8 8 8 1
0<BR>lnet.peers = <A
href="mailto:192.168.52.86@tcp">192.168.52.86@tcp</A>
1 ~rtr 8
8 8 8 -12
0<BR>lnet.peers = <A
href="mailto:192.168.52.87@tcp">192.168.52.87@tcp</A>
1 ~rtr 8
8 8 8 3
0<BR>lnet.peers = <A
href="mailto:192.168.52.88@tcp">192.168.52.88@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.89@tcp">192.168.52.89@tcp</A>
1 ~rtr 8
8 8 8 1
0<BR>lnet.peers = <A
href="mailto:192.168.52.90@tcp">192.168.52.90@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.91@tcp">192.168.52.91@tcp</A>
1 ~rtr 8
8 8 8 3
0<BR>lnet.peers = <A
href="mailto:192.168.52.92@tcp">192.168.52.92@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.93@tcp">192.168.52.93@tcp</A>
1 ~rtr 8
8 8 8 -14
0<BR>lnet.peers = <A
href="mailto:192.168.52.94@tcp">192.168.52.94@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.95@tcp">192.168.52.95@tcp</A>
1 ~rtr 8
8 8 8 -19
0<BR>lnet.peers = <A
href="mailto:192.168.52.96@tcp">192.168.52.96@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.97@tcp">192.168.52.97@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.98@tcp">192.168.52.98@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.52.99@tcp">192.168.52.99@tcp</A>
1 ~rtr 8
8 8 8 -3
0<BR>lnet.peers = <A
href="mailto:192.168.52.100@tcp">192.168.52.100@tcp</A>
1 ~rtr 8
8 8 8 -4
0<BR>lnet.peers = <A
href="mailto:192.168.52.101@tcp">192.168.52.101@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:202.122.33.82@tcp">202.122.33.82@tcp</A>
1 ~rtr 8
8 8 8 -6383 0<BR>lnet.peers = <A
href="mailto:192.168.52.102@tcp">192.168.52.102@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:202.122.33.83@tcp">202.122.33.83@tcp</A>
1 ~rtr 8
8 8 8 -6
0<BR>lnet.peers = <A
href="mailto:192.168.52.103@tcp">192.168.52.103@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:202.122.33.84@tcp">202.122.33.84@tcp</A>
1 ~rtr 8
8 8 8 -649 0<BR>lnet.peers
= <A
href="mailto:192.168.52.104@tcp">192.168.52.104@tcp</A>
1 ~rtr 8
8 8 8 -6
0<BR>lnet.peers = <A
href="mailto:192.168.52.105@tcp">192.168.52.105@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.106@tcp">192.168.52.106@tcp</A>
1 ~rtr 8
8 8 8 -15
0<BR>lnet.peers = <A
href="mailto:192.168.52.107@tcp">192.168.52.107@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.108@tcp">192.168.52.108@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.109@tcp">192.168.52.109@tcp</A>
1 ~rtr 8
8 8 8 -79
0<BR>lnet.peers = <A
href="mailto:192.168.52.110@tcp">192.168.52.110@tcp</A>
1 ~rtr 8
8 8 8 -24
0<BR>lnet.peers = <A
href="mailto:192.168.52.111@tcp">192.168.52.111@tcp</A>
1 ~rtr 8
8 8 8 -102 0<BR>lnet.peers
= <A
href="mailto:192.168.52.112@tcp">192.168.52.112@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:202.122.33.92@tcp">202.122.33.92@tcp</A>
1 ~rtr 8
8 8 8 -1148 0<BR>lnet.peers = <A
href="mailto:202.122.33.93@tcp">202.122.33.93@tcp</A>
1 ~rtr 8
8 8 8 -5
0<BR>lnet.peers = <A
href="mailto:192.168.52.113@tcp">192.168.52.113@tcp</A>
1 ~rtr 8
8 8 8 -55
0<BR>lnet.peers = <A
href="mailto:192.168.52.114@tcp">192.168.52.114@tcp</A>
1 ~rtr 8
8 8 8 -73
0<BR>lnet.peers = <A
href="mailto:192.168.52.115@tcp">192.168.52.115@tcp</A>
1 ~rtr 8
8 8 8 -6
0<BR>lnet.peers = <A
href="mailto:202.122.33.95@tcp">202.122.33.95@tcp</A>
1 ~rtr 8
8 8 8 -1914 0<BR>lnet.peers = <A
href="mailto:192.168.52.116@tcp">192.168.52.116@tcp</A>
1 ~rtr 8
8 8 8 -4
0<BR>lnet.peers = <A
href="mailto:192.168.52.117@tcp">192.168.52.117@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.118@tcp">192.168.52.118@tcp</A>
1 ~rtr 8
8 8 8 -55
0<BR>lnet.peers = <A
href="mailto:192.168.52.119@tcp">192.168.52.119@tcp</A>
1 ~rtr 8
8 8 8 -1
0<BR>lnet.peers = <A
href="mailto:192.168.52.120@tcp">192.168.52.120@tcp</A>
1 ~rtr 8
8 8 8 1
0<BR>lnet.peers = <A
href="mailto:192.168.52.121@tcp">192.168.52.121@tcp</A>
1 ~rtr 8
8 8 8 -54
0<BR>lnet.peers = <A
href="mailto:192.168.52.122@tcp">192.168.52.122@tcp</A>
1 ~rtr 8
8 8 8 -65
0<BR>lnet.peers = <A
href="mailto:192.168.52.123@tcp">192.168.52.123@tcp</A>
1 ~rtr 8
8 8 8 -16
0<BR>lnet.peers = <A
href="mailto:192.168.52.124@tcp">192.168.52.124@tcp</A>
1 ~rtr 8
8 8 8 -32
0<BR>lnet.peers = <A
href="mailto:192.168.52.125@tcp">192.168.52.125@tcp</A>
1 ~rtr 8
8 8 8 -158 0<BR>lnet.peers
= <A
href="mailto:192.168.52.126@tcp">192.168.52.126@tcp</A>
1 ~rtr 8
8 8 8 0
0<BR>lnet.peers = <A
href="mailto:192.168.52.127@tcp">192.168.52.127@tcp</A>
1 ~rtr 8
8 8 8 -2
0<BR>lnet.peers = <A
href="mailto:192.168.52.128@tcp">192.168.52.128@tcp</A>
1 ~rtr 8
8 8 8 -36
0<BR>lnet.peers = <A
href="mailto:192.168.52.129@tcp">192.168.52.129@tcp</A>
1 ~rtr 8
8 8 8 -120 0<BR>lnet.peers
= <A
href="mailto:192.168.52.130@tcp">192.168.52.130@tcp</A>
1 ~rtr 8
8 8 8 2
0<BR>lnet.peers = <A
href="mailto:192.168.52.131@tcp">192.168.52.131@tcp</A>
1 ~rtr 8
8 8 8 -82
0<BR>lnet.peers = <A
href="mailto:192.168.55.11@tcp">192.168.55.11@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.55.12@tcp">192.168.55.12@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.55.13@tcp">192.168.55.13@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.55.14@tcp">192.168.55.14@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.55.15@tcp">192.168.55.15@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.55.16@tcp">192.168.55.16@tcp</A>
1 ~rtr 8
8 8 8 4
0<BR>lnet.peers = <A
href="mailto:192.168.51.134@tcp">192.168.51.134@tcp</A>
1 ~rtr 8
8 8 8 -631
0<BR>lnet.routers = ref rtr_ref alive_cnt state
last_ping router<BR>lnet.routes = Routing disabled<BR>lnet.routes =
net hops state router<BR>lnet.stats =
7 6513 0 349123954 349123978 0 25 9871897726514 80688968391 0
7600<BR>lnet.debug_mb = 41<BR>lnet.panic_on_lbug = 0<BR>lnet.catastrophe =
0<BR>lnet.memused = 4166984<BR>lnet.upcall =
/usr/lib/lustre/lnet_upcall<BR>lnet.debug_path =
/tmp/lustre-log<BR>lnet.console_backoff = 2<BR>lnet.console_min_delay_centisecs
= 50<BR>lnet.console_max_delay_centisecs = 60000<BR>lnet.console_ratelimit =
1<BR>lnet.printk = warning error emerg console<BR>lnet.subsystem_debug =
undefined mdc mds osc ost class log llite rpc lnet lnd pinger filter echo ldlm
lov lmv sec gss mgc mgs fid fld<BR>lnet.debug = ioctl neterror warning error
emerg ha config console<BR></DIV>
<DIV> </DIV>
<DIV>My questions is:</DIV>
<DIV>1.What is the signal of the Lustre overload?</DIV>
<DIV>2. Can Lustre reject too many connections before it is going to
crash? </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV></BODY></HTML>