[Lustre-discuss] OSS misconfig and client connect

James Robnett jrobnett at aoc.nrao.edu
Wed Jul 31 08:55:39 PDT 2013


We're running Lustre 1.8.7 on clients and servers.

We recently added an 11th OSS to our lustre filesystem with 4 OSTs,
unfortunately the modprobe.conf LNET line only listed an o2ib0(ib0)
entry from testing, normally the line would look like:

options lnet networks="o2ib0(ib0),tcp0(eth0),tcp1(eth2)"

for IB, Gbit and 10Gbit respectively.

As soon as the new OSTs on the 11th OSS were mounted and activated
our 1gbit and 10gbit clients kernel panic'd, IB clients were fine.
1gbt and 10gbit clients would refuse to mount lustre after that
since they couldn't get to the OSS.

I unmounted the OSTs on that OSS, fixed the modprobe.conf line,
rebooted, and ran

tunefs.lustre --erase-param 
--mgsnode=<ibaddr>@o2ib0,<gbitaddr>@tcp0,<10gbitaddr>@tcp1 --writeconf 
/dev/sd{b,c,d,e}

Where <xxxaddr> is the appropriate IP address.

That seemed to complete without issue and tunefs reports:

Parameters:
mgsnode=<ibaddr>@o2ib0,<gbitaddr>@tcp0,<10gbitaddr>@tcp1

as expected.

Unfortunately 1gbit and 10gbit clients still refuse to mount lustre.

mount.lustre: mount <ipaddr>@tcp0:/lustre at /.lustre/mountpoint failed: 
No such file or directory
Is the MGS specification correct?
Is the filesystem name correct?
If upgrading, is the copied client log valid? (see upgrade docs)

The OSS can ping clients on the 1gbit and 10gbit networks so routing
and networking is fine.

I'm sure I'm simply panicked and missing something obvious.  What
is the proper procedure to fix this mess.  I thought the tunefs.lustre
would do it but it has not.

James Robnett
NRAO/AOC



More information about the lustre-discuss mailing list