[lustre-discuss] Lustre 2.7 deployment issues

jerome.becot at inserm.fr jerome.becot at inserm.fr
Fri Dec 4 02:43:57 PST 2015


Hello Ray,

One consideration first : You try the 2.7 version which is not the 
production one (aka 2.5). From this perspective wether you run 2.7.0 or 
2.7.x won't make any big difference, it is the develpment release.

Then if I understand the problem comes from the infiniband driver module 
which is buggy in the 2.6.32-504.8.1 kernel, meaning that you have to 
update the kernel to fix it. Doing this may result that the 2.7.0 
version on the site, compiled on an older kernel version, will refuse to 
load then. (because kernel modules - i.e the lustre ones here -  relies 
on features that may change between different kernel version making it 
incompatible)

In any case you can try to rebuild the 2.7.0 version from the source to 
your new kernel. The procedure is quite easy :

https://wiki.hpdd.intel.com/display/PUB/Rebuilding+the+Lustre-client+rpms+for+a+new+kernel

It will regenerate the 2.7.0 client uppon your newer kernel with the 
working infinband modules, but the stability is not garanted as the 2.7 
branch is under development anyway.

Or use a precompiled one on the build site if you can't (some nasty bugs 
in the base 2.x.0 version are fixed in the latest builds)

The only thing is to stick to the very same version on mds and oss and 
at least the same or newer version for the clients.

Regards

Le 03-12-2015 16:13, Ray Muno a écrit :
> I am trying to set up a test deployment of Lustre 2.7.
> 
> I pulled RPMS from http://lustre.org/download/ and installed them on a
> set of server running Scientific Linux 6.6 which seems to be a proper
> OS for deployment.  Everything installs and I can format the
> filesystems on the MDS (1) and OSS (2) servers. When I try and mount
> the OST files systems, I get communication errors. I can "lctl ping"
> the servers from each other, but cannot establish communication
> between the MDS and OSS.
> 
> The installation is on servers connected over Infiniband (Qlogic DDR 
> 4X).
> 
> In trying to diagnose the issues related to the error messages, I
> found mention in some list discussions that o2ib is broken in the
> 2.6.32-504.8.1 kernel.
> 
> After much frustration, I pulled a nightly build from
> build.hpdd.intel.com (kernel
> 2.6.32-573.8.1.el6_lustre.g8438f2a.x86_64) and tried the same set up.
> Everything worked as I expected.
> 
> Am I missing something? Is the default release pointed to at
> https://downloads.hpdd.intel.com/ for 2.7 broken in some way? Is it
> just the hardware I am trying to deploy against?
> 
> I can provide specifics about the errors I see, I am just posting this
> to make sure I am pulling the Lustre RPM's from the proper source.


More information about the lustre-discuss mailing list