[Lustre-discuss] Lustre 1.6.5.1 on X4200 and STK 6140 Issues

Malcolm Cowe Malcolm.Cowe at Sun.COM
Mon Oct 6 02:58:28 PDT 2008


Hi Folks,

We are trying to create a small lustre environment on behalf of a 
customer. There are 2 X4200m2 MDS servers, both dual-attached to an STK 
6140 array over FC. This is an active-passive arrangement with a single 
shared volume. Heartbeat is used to co-ordinate file system failover. 
There is a single X4500 OSS server, the storage for which is split into 
6 OSTs. Finally, we have 2 X4600m2 clients, just for kicks.

All systems are connected together over ethernet and infiniband, with 
the IB network being used for Lustre and every system is running RHEL 
4.5 AS. The X4500 OST volumes are created using software RAID, while the 
X4200m2 MDT is accessed using DM Multipath. We downloaded the Lustre 
binary packages from SUN's web site and installed them onto each of the 
servers.

Unfortunately, the resulting system is very unstable and is prone to 
lock-ups on the servers (uptimes are measured in hours). These lock-ups 
happen without warning, and with very little, if any, debug information 
in the system logs. We have also observed the servers locking up on 
shutdown (kernel panics). Based on the documentation in the Lustre 
operations manual, we installed the RPMs as follows:

rpm -Uvh --force e2fsprogs-1.40.7.sun3-0redhat.x86_64.rpm
rpm -ivh kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm
rpm -ivh kernel-lustre-source-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm
rpm -ivh 
lustre-modules-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm # 
(many "unknown symbol" warnings)
rpm -ivh lustre-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm
rpm -ivh lustre-source-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm
rpm -ivh 
lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm # 
(many "unknown symbol" warnings)
mv /etc/init.d/openibd /etc/init.d/openibd.rhel4default
rpm -ivh --force kernel-ib-1.3-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm
cp /etc/init.d/openibd /etc/init.d/openibd.lustre.1.6.5.1

We then reboot the system and load RHEL using the Lustre kernel. Now we 
install the Voltaire OFED software:

   1. Copy the kernel config used to build the Lustre patched kernel
      into the Lustre kernel source tree:

      cp /boot/config-2.6.9-67.0.7.EL_lustre.1.6.5.1smp \
      /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1/.config

   2. Change into the Lustre kernel source and edit the Makefile. Change
      "custom" suffix to "smp" in the variable "EXTRAVERSION".
   3. Change into the lustre kernel source and run these setup commands:

      make oldconfig || make menuconfig
      make include/asm
      make include/linux/version.h
      make SUBDIRS=scripts

   4. Change into the "-obj" directory and run these setup commands:

      cd /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1-obj/x86_64/smp
      ln -s /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1/include .

   5. Unpack the Voltaire OFED tar-ball:

      tar zxf VoltaireOFED-5.1.3.1_5.tgz

   6. Change to the unpacked software directory and run the installation
      script. To build the OFED packages with the Voltaire certified
      configuration, run the following commands:

      cd VoltaireOFED-5.1.3.1_5
      ./install.pl -c ofed.conf.Volt

   7. Once complete, reboot.
   8. Configure any IPoIB interfaces as required.
   9. Add the following into /etc/modprobe.conf:

      options lnet networks="o2ib0(ib0)"

  10. Load the Lustre LNET kernel module.

      modprobe lnet

  11. Start the Lustre core networking service.

      lctl network up

  12. Check the system log (/var/log/messages) for confirmation.


Create the MGS/MDT Lustre Volume:

   1. Format the MGS/MDT device.

      mkfs.lustre [ --reformat ] --fsname lfs01 --mdt --mgs
      --failnode=mds-2 at o2ib0 /dev/dm-0

   2. Create the MGS/MDT file system mount point.

      mkdir -p /lustre/mdt/lfs01

   3. Mount the file system. This will initiate MGS and MDT services for
      Lustre.

      mount -t lustre /dev/dm-0 /lustre/mdt/lfs01

With the exception of the OST volume creation, we use an equivalent 
process to bring the OSS online.

The cabling has been checked and verified. So we re-built the system 
from scratch and applied only SUN's RDAC modules and Voltaire OFED to 
the stock RHEL 4.5 kernel (2.6.9-55.ELsmp). We removed the second MDS 
from the h/w configuration and did not install Heartbeat. The shared 
storage was re-formatted as a regular EXT3 file system using the DM 
multipathing device, /dev/dm-0, and mounted onto the host. Running I/O 
tests onto the mounted file system over an extended period did not 
elicit a single error or warning message in the log related to the 
multipathing or the SCSI device.

Once we were confident that the system was running in a consistent and 
stable manner, we re-installed the Lustre packages, omitting the 
kernel-ib packages. We had to re-build and re-install the RDAC support 
as well. This means that the system has support for the Lustre file 
system but no infiniband support at all. /etc/modprobe.conf is updated 
such that the lnet networks option is set to "tcp". The MDS/MGS volume 
is recreated on the DM device.

We have tried the following configurations on the X4200m2:

    * RHEL vanilla kernel, multipathd, RDAC. EXT-3 file system. PASSED.
    * RHEL vanilla kernel, multipathd, RDAC, Voltaire OFED. EXT-3 file
      system. PASSED.
    * Lustre supplied kernel, Lustre software. No IB. MDS/MGS file
      system. FAILED.
    * Lustre supplied kernel, Lustre software, RDAC. No IB. MDS/MGS file
      system (Full Lustre FS over Ethernet). FAILED.
    * Lustre supplied kernel, Lustre software, RDAC, Voltaire OFED.
      EXT-3 file system. FAILED.
    * Lustre supplied kernel, Lustre software. RDAC, Voltaire OFED.
      MDS/MGS file system (Full Lustre FS over IB). FAILED.

Our findings indicate that there is a problem within the binary 
distribution of Lustre. This may be due to the fact that we are applying 
the 2.6.9-67 RHEL kernel to a platform based upon 2.6.9.-55, or it may 
be a more subtle issue based on the interaction with the underlying 
hardware. We could use some advice on how best to proceed, since our 
deadline fast approaches. For example, is our build process, as 
documented above, clean? Currently, we're looking at building from 
source, to see if this results in a more stable environment.

Regards,

Malcolm.

-- 
<http://www.sun.com> 	
*Malcolm Cowe*
/Solutions Integration Engineer/

*Sun Microsystems, Inc.*
Blackness Road
Linlithgow, West Lothian EH49 7LR UK
Phone: x73602 / +44 1506 673 602
Email: Malcolm.Cowe at Sun.COM

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081006/76d65993/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 1257 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20081006/76d65993/attachment.gif>


More information about the lustre-discuss mailing list