[Lustre-discuss] OSS misconfig and client connect

White, Cliff cliff.white at intel.com
Wed Jul 31 11:43:18 PDT 2013


On 7/31/13 10:37 AM, "James Robnett" <jrobnett at aoc.nrao.edu> wrote:

>
>I'm now suspicious that I need to unmount all the OSSes (for
>correctness), unmount the MDS and run
>
>tunefs.lustre --writeconf /dev/md0
>
>on it to clear the logs and then remount.
>
>Note we have a combined MDS/MGS.

Yes. Since the configuration is held on the MDS, you need to do the
--writeconf, then remount the servers.
Procedure should be in the Lustre Manual
Cliffw

>
>James
>
>>
>> On 07/31/2013 09:55 AM, James Robnett wrote:
>>>
>>> We're running Lustre 1.8.7 on clients and servers.
>>>
>>> We recently added an 11th OSS to our lustre filesystem with 4 OSTs,
>>> unfortunately the modprobe.conf LNET line only listed an o2ib0(ib0)
>>> entry from testing, normally the line would look like:
>>>
>>> options lnet networks="o2ib0(ib0),tcp0(eth0),tcp1(eth2)"
>>>
>>> for IB, Gbit and 10Gbit respectively.
>>>
>>> As soon as the new OSTs on the 11th OSS were mounted and activated
>>> our 1gbit and 10gbit clients kernel panic'd, IB clients were fine.
>>> 1gbt and 10gbit clients would refuse to mount lustre after that
>>> since they couldn't get to the OSS.
>>>
>>> I unmounted the OSTs on that OSS, fixed the modprobe.conf line,
>>> rebooted, and ran
>>>
>>> tunefs.lustre --erase-param
>>> --mgsnode=<ibaddr>@o2ib0,<gbitaddr>@tcp0,<10gbitaddr>@tcp1 --writeconf
>>> /dev/sd{b,c,d,e}
>>>
>>> Where <xxxaddr> is the appropriate IP address.
>>>
>>> That seemed to complete without issue and tunefs reports:
>>>
>>> Parameters:
>>> mgsnode=<ibaddr>@o2ib0,<gbitaddr>@tcp0,<10gbitaddr>@tcp1
>>>
>>> as expected.
>>>
>>> Unfortunately 1gbit and 10gbit clients still refuse to mount lustre.
>>>
>>> mount.lustre: mount <ipaddr>@tcp0:/lustre at /.lustre/mountpoint
>>>failed:
>>> No such file or directory
>>> Is the MGS specification correct?
>>> Is the filesystem name correct?
>>> If upgrading, is the copied client log valid? (see upgrade docs)
>>>
>>> The OSS can ping clients on the 1gbit and 10gbit networks so routing
>>> and networking is fine.
>>>
>>> I'm sure I'm simply panicked and missing something obvious.  What
>>> is the proper procedure to fix this mess.  I thought the tunefs.lustre
>>> would do it but it has not.
>>>
>>> James Robnett
>>> NRAO/AOC
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>_______________________________________________
>Lustre-discuss mailing list
>Lustre-discuss at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss
>





More information about the lustre-discuss mailing list