[Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

Malcolm Cowe Malcolm.Cowe at Sun.COM
Mon Oct 6 07:47:34 PDT 2008


Hey Brian,

I'll have to re-install the system from scratch in order to be able to 
answer some of your questions, which I'll get started on this evening. 
What I was hoping for in the first instance was a sanity check of our 
installation methods. With respect to the OFED stack used, we are using 
the latest official software stack supplied by Voltaire. The reason for 
this is that there is more to OFED than just the kernel modules, 
including many libraries and tools, plus the latest firmware for the 
cards. It's what the customer has asked for, and it is what the card 
vendor expects us to do.

We may be able to get away with OFED 1.3, but I would still like some 
guidance on how to install the rest of the OFED stack -- do we use the 
OFED source to rebuild everything, or can we pick the Lustre supplied 
kernel modules and just layer on the other stuff separately? Like I 
said, sanity-checking the install procedure is important.

Finally, when I said that one file system fails versus another passes, I 
mean that the server locks solid, crashes, usually with no debug to 
speak of (nothing in the system logs). Even while the system is up and 
running the lustre kernel, if we attempt a clean shutdown, the kernel 
panics.

Since I need to rebuild the systems anyway, I will also try to install 
the packages in the order mentioned by Megan Larko, to see how that 
affects the installation. We have been following the instructions in the 
Lustre Operations Manual (v. 1.14).

Regards,

Malcolm.


Brian J. Murrell wrote:
> On Mon, 2008-10-06 at 10:58 +0100, Malcolm Cowe wrote:
>   
>> rpm -Uvh --force e2fsprogs-1.40.7.sun3-0redhat.x86_64.rpm
>>     
>
> You should not (have to) use --force.  If you do, there is either an
> operational error or a bug in our packages.  In the latter case, please
> file a bug in our bugzilla.
>
>   
>> rpm -ivh
>> lustre-modules-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm #
>> (many "unknown symbol" warnings)
>>     
>
> Can you paste them here?
>
>   
>> rpm -ivh
>> lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm #
>> (many "unknown symbol" warnings)
>>     
>
> Ditto.
>
>   
>> rpm -ivh --force
>> kernel-ib-1.3-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm 
>>     
>
> Again, you should not need to use --force.
>
>   
>> We then reboot the system and load RHEL using the Lustre kernel. Now
>> we install the Voltaire OFED software:
>>     
>
> Why?  The kernel-ib package you installed above should provide a working
> OFED stack.
>
>   
>>      1. Unpack the Voltaire OFED tar-ball:
>>         
>>         tar zxf VoltaireOFED-5.1.3.1_5.tgz
>>     
>
> Do you really need 1.3.1?  If so, then you should not install the 1.3
> kernel-ib package we provide above.  I really wonder why you need 1.3.1
> though.
>
>   
>>       * Lustre supplied kernel, Lustre software. No IB. MDS/MGS file
>>         system. FAILED.
>>     
>
> Failed in what way?
>
>   
>>       * Lustre supplied kernel, Lustre software, RDAC. No IB. MDS/MGS
>>         file system (Full Lustre FS over Ethernet). FAILED.
>>     
>
> Again, in what way?
>
>   
>>       * Lustre supplied kernel, Lustre software, RDAC, Voltaire OFED.
>>         EXT-3 file system. FAILED.
>>     
>
> Ditto.
>
>   
>>       * Lustre supplied kernel, Lustre software. RDAC, Voltaire OFED.
>>         MDS/MGS file system (Full Lustre FS over IB). FAILED.
>>     
>
> And Ditto again.
>
> You have to provide more details than just "FAILED" if we are to try to
> help diagnose a problem.
>
>   
>> Our findings indicate that there is a problem within the binary
>> distribution of Lustre.
>>     
>
> I think that many of our users use it as is, so it cannot be all that
> bad.
>
>   
>> This may be due to the fact that we are applying the 2.6.9-67 RHEL
>> kernel to a platform based upon 2.6.9.-55,
>>     
>
> That shouldn't be a problem in and of itself.
>
> b.
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   

-- 
<http://www.sun.com> 	
*Malcolm Cowe*
/Solutions Integration Engineer/

*Sun Microsystems, Inc.*
Blackness Road
Linlithgow, West Lothian EH49 7LR UK
Phone: x73602 / +44 1506 673 602
Email: Malcolm.Cowe at Sun.COM

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081006/1ebc04de/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6g_top.gif
Type: image/gif
Size: 1257 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081006/1ebc04de/attachment.gif>


More information about the lustre-discuss mailing list