<html>
  <head>
    <meta content="text/html; charset=GB2312" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Hi, Qiulan<br>
    <br>
    LU-952 is about a deadlock issue, was the quota enabled? You could
    try to disable quota and see if the problem is gone.<br>
    <br>
    Thanks<br>
    - Niu<br>
    <br>
    <blockquote cite="mid:201205311449342959415@ihep.ac.cn" type="cite">
      <meta content="text/html; charset=GB2312"
        http-equiv="Content-Type">
      <meta name="GENERATOR" content="MSHTML 9.00.8112.16443">
      <style>@font-face {
        font-family: 宋体;
}
@font-face {
        font-family: Verdana;
}
@font-face {
        font-family: @宋体;
}
@page Section1 {size: 595.3pt 841.9pt; margin: 72.0pt 90.0pt 72.0pt 90.0pt; layout-grid: 15.6pt; }
P.MsoNormal {
        TEXT-JUSTIFY: inter-ideograph; TEXT-ALIGN: justify; MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman"; FONT-SIZE: 10.5pt
}
LI.MsoNormal {
        TEXT-JUSTIFY: inter-ideograph; TEXT-ALIGN: justify; MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman"; FONT-SIZE: 10.5pt
}
DIV.MsoNormal {
        TEXT-JUSTIFY: inter-ideograph; TEXT-ALIGN: justify; MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman"; FONT-SIZE: 10.5pt
}
A:link {
        COLOR: blue; TEXT-DECORATION: underline
}
SPAN.MsoHyperlink {
        COLOR: blue; TEXT-DECORATION: underline
}
A:visited {
        COLOR: purple; TEXT-DECORATION: underline
}
SPAN.MsoHyperlinkFollowed {
        COLOR: purple; TEXT-DECORATION: underline
}
SPAN.EmailStyle17 {
        FONT-STYLE: normal; FONT-FAMILY: Verdana; COLOR: windowtext; FONT-WEIGHT: normal; TEXT-DECORATION: none; mso-style-type: personal-compose
}
DIV.Section1 {
        page: Section1
}
UNKNOWN {
        FONT-SIZE: 10pt
}
BLOCKQUOTE {
        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px; MARGIN-LEFT: 2em
}
OL {
        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px
}
UL {
        MARGIN-TOP: 0px; MARGIN-BOTTOM: 0px
}
</style>
      <div><font color="#000080" face="Verdana" size="2">Hi Zhen,</font></div>
      <div> </div>
      <div><font color="#000080">Many thanks to your prompt reply. I
          have disabled the writhetrhough_cache and read_cache to see
          the problem but it still hung thread when there is  heavy IO.
        </font></div>
      <div> </div>
      <div> </div>
      <div>
        <div>May 31 08:38:43 boss33 kernel: LustreError: dumping log to /tmp/lustre-log.1338424722.5303</div>
        <div>May 31 08:38:43 boss33 kernel: Pid: 5262, comm: ll_ost_io_48</div>
        <div>May 31 08:38:43 boss33 kernel:</div>
        <div>May 31 08:38:43 boss33 kernel: Call Trace:</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff887bd451>] ksocknal_queue_tx_locked+0x451/0x490 [ksocklnd]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff800646ac>] __down_read+0x7a/0x92</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff889843df>] ldiskfs_get_blocks+0x5f/0x2e0 [ldiskfs]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff889851a0>] ldiskfs_get_block+0xc0/0x120 [ldiskfs]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff88981f60>] ldiskfs_bmap+0x0/0xf0 [ldiskfs]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff80033615>] generic_block_bmap+0x37/0x41</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff800341ad>] mapping_tagged+0x3c/0x47</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff88981f88>] ldiskfs_bmap+0x28/0xf0 [ldiskfs]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff88981f60>] ldiskfs_bmap+0x0/0xf0 [ldiskfs]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff88a4a288>] filter_commitrw_write+0x398/0x2be0 [obdfilter]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff889e6e5c>] ost_checksum_bulk+0x30c/0x5b0 [ost]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff889e6c38>] ost_checksum_bulk+0xe8/0x5b0 [ost]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff889edcf9>] ost_brw_write+0x1c99/0x2480 [ost]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff8872e658>] ptlrpc_send_reply+0x5c8/0x5e0 [ptlrpc]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff886f98b0>] target_committed_to_req+0x40/0x120 [ptlrpc]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff8008cf93>] default_wake_function+0x0/0xe</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff88732bc8>] lustre_msg_check_version_v2+0x8/0x20 [ptlrpc]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff889f108e>] ost_handle+0x2bae/0x55b0 [ost]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff80150d56>] __next_cpu+0x19/0x28</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff800767ae>] smp_send_reschedule+0x4e/0x53</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff8874215a>] ptlrpc_server_handle_request+0x97a/0xdf0 [ptlrpc]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff887428a8>] ptlrpc_wait_event+0x2d8/0x310 [ptlrpc]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff8008b3bd>] __wake_up_common+0x3e/0x68</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff88743817>] ptlrpc_main+0xf37/0x10f0 [ptlrpc]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff887428e0>] ptlrpc_main+0x0/0x10f0 [ptlrpc]</div>
        <div>May 31 08:38:43 boss33 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11</div>
        <div>May 31 08:38:43 boss33 kernel:</div>
        <div>May 31 08:38:43 boss33 kernel: LustreError: dumping log to /tmp/lustre-log.1338424722.5262</div>
        <div>May 31 08:38:43 boss33 kernel: Lustre: bes3fs-OST0064: slow journal start 48s due to heavy IO load</div>
        <div>May 31 08:38:43 boss33 kernel: Lustre: Skipped 1 previous similar message</div>
        <div>May 31 08:38:43 boss33 kernel: Lustre: bes3fs-OST0064: slow brw_start 48s due to heavy IO load</div>
        <div>May 31 08:38:43 boss33 kernel: Lustre: Skipped 1 previous similar message</div>
        <div>May 31 08:38:43 boss33 kernel: Lustre: bes3fs-OST0064: slow journal start 187s due to heavy IO load</div>
        <div>May 31 08:38:43 boss33 kernel: Lustre: bes3fs-OST0064: slow brw_start 187s due to heavy IO load</div>
      </div>
      <div> </div>
      <div><font color="#000080">I have not  patched the bug because the
          all servers is online. Could you know how to deal with it
          without affecting users?</font></div>
      <div> </div>
      <div><font color="#000080">Thank you very much.</font></div>
      <div> </div>
      <div> </div>
      <div><font color="#000080">Cheers,</font></div>
      <div><font color="#000080">Qiulan</font></div>
      <div><font color="#000080" face="Verdana" size="2"><font
            color="#000000">====================================================================<br>
            Computing center,the Institute of High Energy Physics, China<br>
            Huang, Qiulan Tel: (+86) 10 8823 6010-105<br>
            P.O. Box 918-7 Fax: (+86) 10 8823 6839<br>
            Beijing 100049 P.R. China Email: </font><a
            moz-do-not-send="true" href="mailto:huangql@ihep.ac.cn">huangql@ihep.ac.cn</a><br>
          <font color="#000000">===================================================================<span
              style="WHITE-SPACE: pre" class="Apple-tab-span"> </span></font><br>
        </font></div>
      <div> </div>
      <div><font color="#c0c0c0" face="Verdana" size="2">2012-05-31 </font></div>
      <font color="#000080" face="Verdana" size="2">
        <hr style="WIDTH: 100px" align="left" color="#b5c4df" size="1">
      </font>
      <div><font color="#c0c0c0" face="Verdana" size="2"><span>huangql</span>
        </font></div>
      <hr color="#b5c4df" size="1">
      <div><font face="Verdana" size="2"><strong>发件人:</strong> Liang
          Zhen </font></div>
      <div><font face="Verdana" size="2"><strong>发送时间:</strong>
          2012-05-30  19:12:15 </font></div>
      <div><font face="Verdana" size="2"><strong>收件人:</strong> huangql </font></div>
      <div><font face="Verdana" size="2"><strong>抄送:</strong>
          lustre-discuss; wc-discuss </font></div>
      <div><font face="Verdana" size="2"><strong>主题:</strong> [SPAM] Re:
          [wc-discuss] The ost_connect operation failedwith -16 </font></div>
      <div> </div>
      <div><font face="Verdana" size="2">Hi, I think you might hit
          this: <a moz-do-not-send="true"
            href="http://jira.whamcloud.com/browse/LU-952">http://jira.whamcloud.com/browse/LU-952</a> ,
          you can find the patch from this ticket
          <div><br>
          </div>
          <div>Regards</div>
          <div>Liang<br>
            <div>
              <div><br>
                <div>
                  <div>On May 30, 2012, at 11:21 AM, huangql wrote:</div>
                  <br class="Apple-interchange-newline">
                  <blockquote type="cite">
                    <div>Dear  all,<br>
                      <br>
                      Recently we found the problem in OSS that some
                      threads might be hung when the server got heavy IO
                      load. In this case, some clients will be evicted
                      or refused by some OSTs and got the error messages
                      as following:<br>
                      <br>
                      Server side:<br>
                      <br>
                      May 30 11:06:31 boss07 kernel: Lustre: Service
                      thread pid 8011 was inactive for 200.00s. The
                      thread might be hung, or it might only be slow and
                      will resume later. D<br>
                      umping the stack trace for debugging purposes: May
                      30 11:06:31 boss07 kernel: Lustre: Skipped 1
                      previous similar message<br>
                      May 30 11:06:31 boss07 kernel: Pid: 8011, comm:
                      ll_ost_71 <br>
                      May 30 11:06:31 boss07 kernel: <br>
                      May 30 11:06:31 boss07 kernel: Call Trace:<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff886f5d0e>]
                      start_this_handle+0x301/0x3cb [jbd2]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff800a09ca>]
                      autoremove_wake_function+0x0/0x2e<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff886f5e83>]
                      jbd2_journal_start+0xab/0xdf [jbd2]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff888ce9b2>]
                      fsfilt_ldiskfs_start+0x4c2/0x590 [fsfilt_ldiskfs]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff88920551>]
                      filter_version_get_check+0x91/0x2a0 [obdfilter]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff80036cf4>]
                      __lookup_hash+0x61/0x12f<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff8893108d>]
                      filter_setattr_internal+0x90d/0x1de0 [obdfilter]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff800e859b>]
                      lookup_one_len+0x53/0x61<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff88925452>]
                      filter_fid2dentry+0x512/0x740 [obdfilter]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff88924e27>]
                      filter_fmd_get+0x2b7/0x320 [obdfilter]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff8003027b>] __up_write+0x27/0xf2<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff88932721>]
                      filter_setattr+0x1c1/0x3b0 [obdfilter]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff8882677a>]
                      lustre_pack_reply_flags+0x86a/0x950 [ptlrpc]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff8881e658>]
                      ptlrpc_send_reply+0x5c8/0x5e0 [ptlrpc]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff88822b05>]
                      lustre_msg_get_version+0x35/0xf0 [ptlrpc]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff888b0abb>]
                      ost_handle+0x25db/0x55b0 [ost]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff80150d56>] __next_cpu+0x19/0x28<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff800767ae>]
                      smp_send_reschedule+0x4e/0x53<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff8883215a>]
                      ptlrpc_server_handle_request+0x97a/0xdf0 [ptlrpc]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff888328a8>]
                      ptlrpc_wait_event+0x2d8/0x310 [ptlrpc]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff8008b3bd>]
                      __wake_up_common+0x3e/0x68<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff88833817>]
                      ptlrpc_main+0xf37/0x10f0 [ptlrpc]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff8005dfb1>] child_rip+0xa/0x11<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff888328e0>] ptlrpc_main+0x0/0x10f0
                      [ptlrpc]<br>
                      May 30 11:06:31 boss07 kernel:
                       [<ffffffff8005dfa7>] child_rip+0x0/0x11<br>
                      May 30 11:06:31 boss07 kernel:<br>
                      May 30 11:06:31 boss07 kernel: LustreError:
                      dumping log to /tmp/lustre-log.1338347191.8011<br>
                      <br>
                      <br>
                      Client side:<br>
                      <br>
                      May 30 09:58:36 ccopt kernel: LustreError: 11-0:
                      an error occurred while communicating with
                      192.168.50.123@tcp. The ost_connect operation
                      failed with -16<br>
                      <br>
                      When you got this error message, you failed to run
                      "ls", "df" ,"vi", "touch" and so on, which affect
                      us to do anything in the file system.<br>
                      I think the ost_connect failure could report some
                      error messages to users instead of  causing any
                      interactive actions stuck.<br>
                      <br>
                      Could someone give us some advice or any
                      suggestions to solve this problem?<br>
                      <br>
                      Thank you very much in advance.<br>
                      <br>
                      <br>
                      Best Regards<br>
                      Qiulan Huang<br>
                      2012-05-30<br>
====================================================================<br>
                      Computing center,the Institute of High Energy
                      Physics, China<br>
                      Huang, Qiulan                        Tel: (+86) 10
                      8823 6010-105<br>
                      P.O. Box 918-7                       Fax: (+86) 10
                      8823 6839<br>
                      Beijing 100049  P.R. China           Email: <a
                        moz-do-not-send="true"
                        href="mailto:huangql@ihep.ac.cn">huangql@ihep.ac.cn</a><br>
===================================================================<span
                        style="WHITE-SPACE: pre" class="Apple-tab-span">
                      </span><br>
                      <br>
                      <br>
                      <br>
                    </div>
                  </blockquote>
                </div>
                <br>
              </div>
            </div>
          </div>
        </font></div>
    </blockquote>
    <br>
  </body>
</html>