<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">
Hi,<div><br class="webkit-block-placeholder"></div><div>how many clients (compute nodes) you have in your cluster? What is crashing randomly: clients or OSS or MDS or maybe all of them?</div><div>Do you have screenshot of the kernel panic or crashdump log?</div><div><br class="webkit-block-placeholder"></div><div>cheers,</div><div><br class="webkit-block-placeholder"></div><div>Wojciech Turek <br><div><div>On 26 Nov 2007, at 13:25, Somsak Sriprayoonsakul wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">No. I use stock bnx2 driver from pre-built latest kernel-lustre-1.6.3.</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">I forgot to mention about oops. It's something about lustre<span class="Apple-converted-space"> </span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">(lustre_blah_blah_blah something).</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">All other nodes also use bnx2. There's no problem at all.</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Matt wrote:</div> <blockquote type="cite"><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Somsak,</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Did you build your own bnx2 driver? I was getting kernel panics when<span class="Apple-converted-space"> </span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">hitting a certain load with Dell 1950s that also use the bnx2 driver. <span class="Apple-converted-space"> </span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">My solution was to grab the bnx2 source code and build it under the<span class="Apple-converted-space"> </span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Lustre kernel.<span class="Apple-converted-space"> </span>If you search the mailing list you'll find the mails<span class="Apple-converted-space"> </span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">dealing with this.</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">If you see bnx2 mentioned in your kernel panic output, then it's<span class="Apple-converted-space"> </span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">probably the cause.</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Thanks,</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Matt</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">On 26/11/2007, *Somsak Sriprayoonsakul * <<a href="mailto:somsak_sr@thaigrid.or.th">somsak_sr@thaigrid.or.th</a><span class="Apple-converted-space"> </span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><<a href="mailto:somsak_sr@thaigrid.or.th">mailto:somsak_sr@thaigrid.or.th</a>>> wrote:</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Hello,</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>We have a 4 nodes Lustre Cluster that provides parallel file</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>system</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>for our 192 nodes cluster. The Lustre Cluster are CentOS 4.5, x86_64</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>(Intel series 4000), on HP DL360-G5. The cluster that use it is ROCKS</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>4.2.1, on the same set of hardware. Our network is Gigabit Ethernet,</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>using bnx2 driver. Lustre setup is</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>storage-0-0: mgs+mdt, ost0, ost1 (backup)</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>storage-0-1: mgs+mdt (backup), ost0 (backup), ost1</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>storage-0-2: ost2, ost3 (backup)</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>storage-0-3: ost2 (backup), ost3</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>We're using heartbeat 2.0.8 base on pre-built RPM from CentOS. All</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>backup is configure in the way that it'll not run simultaneously with</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>primary. Note that, we enable flock and quota on Lustre.</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>The problem we have right now is, some of the nodes are randomly</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>panic. This happened about once a week or two week. We tolerate this</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>stupidly by setting kernel.panic=60 and hope that the backup node</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>will</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>not failed within the time frame, though this is working quite well</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>(base on user feedback, they do not know that the file system is</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>failed). The backup node take-over OST and do recovery for about 250</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>secs then everything back to normal.</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Anyways we're trying to nail down the reason why the file</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>system is</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>panic. I believe that information above will not suffice to track down</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>the reason. Could someone give me a way to debug or dump some useful</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>information that I can send to the list for later analysis? Also, does</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>the "RECOVERING" suffice to make the file system stable? Do we need to</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>shutdown the whole system and do e2fsck+lfsck?</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Also, every panic time, quota that was enabled will be</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>disabled (lfs</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>quota <user> /fs yield "No such process). I have to do quotaoff and</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>quotaon again. It seems that the quota is not being turn on when</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>OST is</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>boot up. Is there a way to always turn this on?</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Thank you very much in advance</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>--</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>-----------------------------------------------------------------------------------</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Somsak Sriprayoonsakul</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Thai National Grid Center</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Software Industry Promotion Agency</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Ministry of ICT, Thailand</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>somsak_sr@thaigrid.or.th <<a href="mailto:somsak_sr@thaigrid.or.th">mailto:somsak_sr@thaigrid.or.th</a>></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>-----------------------------------------------------------------------------------</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>_______________________________________________</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Lustre-discuss mailing list</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span>Lustre-discuss@clusterfs.com <<a href="mailto:Lustre-discuss@clusterfs.com">mailto:Lustre-discuss@clusterfs.com</a>></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span><a href="https://mail.clusterfs.com/mailman/listinfo/lustre-discuss">https://mail.clusterfs.com/mailman/listinfo/lustre-discuss</a></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><span class="Apple-converted-space"> </span><<a href="https://mail.clusterfs.com/mailman/listinfo/lustre-discuss">https://mail.clusterfs.com/mailman/listinfo/lustre-discuss</a>></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div> </blockquote><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">--<span class="Apple-converted-space"> </span></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">-----------------------------------------------------------------------------------</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Somsak Sriprayoonsakul</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Thai National Grid Center</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Software Industry Promotion Agency</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Ministry of ICT, Thailand</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><a href="mailto:somsak_sr@thaigrid.or.th">somsak_sr@thaigrid.or.th</a></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">-----------------------------------------------------------------------------------</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; min-height: 14px; "><br></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">_______________________________________________</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; ">Lustre-discuss mailing list</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><a href="mailto:Lustre-discuss@clusterfs.com">Lustre-discuss@clusterfs.com</a></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><a href="https://mail.clusterfs.com/mailman/listinfo/lustre-discuss">https://mail.clusterfs.com/mailman/listinfo/lustre-discuss</a></div> </blockquote></div><br><div> <span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0; "><div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font face="Lucida Grande" size="3" style="font: 11.0px Lucida Grande">Mr Wojciech Turek</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; font: normal normal normal 11px/normal Lucida Grande; min-height: 13px; ">Assistant System Manager</div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font face="Lucida Grande" size="3" style="font: 11.0px Lucida Grande">University of Cambridge</font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font face="Lucida Grande" size="3" style="font: 11.0px Lucida Grande">High Performance Computing service </font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font class="Apple-style-span" face="'Lucida Grande'" size="3"><span class="Apple-style-span" style="font-size: 11px;">email: <a href="mailto:wjt27@cam.ac.uk">wjt27@cam.ac.uk</a></span></font></div><div style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; "><font face="Lucida Grande" size="3" style="font: 11.0px Lucida Grande">tel. +441223763517</font></div></div><div><br class="webkit-block-placeholder"></div></span><br class="Apple-interchange-newline"></span></span> </div><br></div></body></html>