[Lustre-discuss] Lustre 126.96.36.199 on X4200 and STK 6140 Issues
Brian J. Murrell
Brian.Murrell at Sun.COM
Mon Oct 6 07:59:33 PDT 2008
On Mon, 2008-10-06 at 15:47 +0100, Malcolm Cowe wrote:
> Hey Brian,
> I'll have to re-install the system from scratch in order to be able to
> answer some of your questions, which I'll get started on this evening.
> What I was hoping for in the first instance was a sanity check of our
> installation methods.
I think I commented on those. If you are going to build your OFED stack
you don't need to install the one we provide.
> With respect to the OFED stack used, we are using the latest official
> software stack supplied by Voltaire. The reason for this is that there
> is more to OFED than just the kernel modules, including many libraries
> and tools,
None of these should be necessary for Lustre to use I/B.
> plus the latest firmware for the cards.
Hrm. Can you not upgrade firmware independent of upgrading the whole
OFED stack? That seems very limiting.
> It's what the customer has asked for, and it is what the card vendor
> expects us to do.
Fair enough. I was just pointing out that you don't need our OFED stack
if you are going to install your own.
> We may be able to get away with OFED 1.3, but I would still like some
> guidance on how to install the rest of the OFED stack
We don't supply the userspace tools because they are not really
necessary for Lustre.
> do we use the OFED source to rebuild everything, or can we pick the
> Lustre supplied kernel modules and just layer on the other stuff
Yes, you should be able to do that. I say that quite generally as I'm
not entirely clear on your operating environment.
> Finally, when I said that one file system fails versus another passes,
> I mean that the server locks solid, crashes, usually with no debug to
> speak of (nothing in the system logs).
Nothing on the console either?
> Even while the system is up and running the lustre kernel, if we
> attempt a clean shutdown, the kernel panics.
Hrm. A panic is quite different than locking solid with no messages at
all. A solid lock with no messages is indicative of hardware problems.
> Since I need to rebuild the systems anyway, I will also try to install
> the packages in the order mentioned by Megan Larko, to see how that
> affects the installation.
I'm not entirely convinced of her process. You should not need to use
--force and reinstall packages already installed. I'd be more
interested in knowing exactly your installation steps and the errors you
get from it. Please try to avoid the use of --force so we can see why
it's necessary. You will have to use "rpm -U" with e2fsprogs though as
she mentions. Do all of your work with the "script(1)" tool so you can
easily log it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 197 bytes
Desc: This is a digitally signed message part
More information about the lustre-discuss