[Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

Brian J. Murrell Brian.Murrell at Sun.COM
Mon Oct 6 07:59:33 PDT 2008


On Mon, 2008-10-06 at 15:47 +0100, Malcolm Cowe wrote:
> Hey Brian,

Hey Malcolm,

> I'll have to re-install the system from scratch in order to be able to
> answer some of your questions, which I'll get started on this evening.

OK.

> What I was hoping for in the first instance was a sanity check of our
> installation methods.

I think I commented on those.  If you are going to build your OFED stack
you don't need to install the one we provide.

> With respect to the OFED stack used, we are using the latest official
> software stack supplied by Voltaire. The reason for this is that there
> is more to OFED than just the kernel modules, including many libraries
> and tools,

None of these should be necessary for Lustre to use I/B.

> plus the latest firmware for the cards.

Hrm.  Can you not upgrade firmware independent of upgrading the whole
OFED stack?  That seems very limiting.

> It's what the customer has asked for, and it is what the card vendor
> expects us to do.

Fair enough.  I was just pointing out that you don't need our OFED stack
if you are going to install your own.

> We may be able to get away with OFED 1.3, but I would still like some
> guidance on how to install the rest of the OFED stack

We don't supply the userspace tools because they are not really
necessary for Lustre.

> do we use the OFED source to rebuild everything, or can we pick the
> Lustre supplied kernel modules and just layer on the other stuff
> separately?

Yes, you should be able to do that.  I say that quite generally as I'm
not entirely clear on your operating environment.

> Finally, when I said that one file system fails versus another passes,
> I mean that the server locks solid, crashes, usually with no debug to
> speak of (nothing in the system logs).

Nothing on the console either?

> Even while the system is up and running the lustre kernel, if we
> attempt a clean shutdown, the kernel panics.

Hrm.  A panic is quite different than locking solid with no messages at
all.  A solid lock with no messages is indicative of hardware problems.

> Since I need to rebuild the systems anyway, I will also try to install
> the packages in the order mentioned by Megan Larko, to see how that
> affects the installation.

I'm not entirely convinced of her process.  You should not need to use
--force and reinstall packages already installed.  I'd be more
interested in knowing exactly your installation steps and the errors you
get from it.  Please try to avoid the use of --force so we can see why
it's necessary.  You will have to use "rpm -U" with e2fsprogs though as
she mentions.  Do all of your work with the "script(1)" tool so you can
easily log it.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081006/32bbf4ff/attachment.pgp>


More information about the lustre-discuss mailing list