[Lustre-discuss] Added Dual-homed OSS; ethernet clients confused

D. Marc Stearman marc at llnl.gov
Wed Apr 23 07:50:57 PDT 2008


On Apr 23, 2008, at 7:10 AM, Chris Worley wrote:

> On Wed, Apr 23, 2008 at 12:42 AM, Andreas Dilger <adilger at sun.com>  
> wrote:
>> On Apr 22, 2008  18:08 -0600, Chris Worley wrote:
>>> The error specifically complains about the first OST/disk on the new
>>> OSS, OST0026.  It's tunefs.lustre output was:
>>>
>>
>>>>  On the OSS in question, for each OST, I did:
>>>>
>>>>  # tunefs.lustre --writeconf --ost
>>>>
>>>> --mgsnode="36.102.29.1 at o2ib0,36.101.29.1 at tcp0" --fsname=lfs --param
>>>>  sys.timeout=40 --param lov.stripesize=2M /dev/sdl
>>
>>>>  Lustre:    cmd=cf003 0:lfs-OST0026-osc  1:lfs-OST0026_UUID   
>>>> 2:36.102.29.4 at o2ib
>>>>  LustreError: 15c-8: MGC36.101.29.1 at tcp: The configuration from log
>>>>  'lfs-client' failed (-2). This may be the result of communication
>>>>  errors between this node and the MGS, a bad configuration, or  
>>>> other
>>>>  errors. See the syslog for more information.
>>
>>  The problem is that the NID for the new OST is the IPoIB address,  
>> and this
>>  is what the TCP client is trying to connect to.  If you specify  
>> the TCP
>>  NID first this may help.  Also note that the client does not get the
>>  config from the OSTs, but rather the MGS, so you need to do a -- 
>> write-conf
>>  on there.
>
> This is confusing as the man page for "tunefs.lustre" wants a device
> name at the end of the command... and the device is on another OSS...
> "/dev/sda" on the MGS is a totally different drive.  Can I use the
> label?

A --write-conf on the MGS will remove the file system config  
information, which forces the MGS to recreate it.  Try this:

1.  Stop all clients and servers (make sure to unload modules on  
clients to make sure they don't have any devices lingering about)

2.  Run your tunefs.lustre command on the new OST, and use --erase- 
params so you don't have duplicate parameters. (I noticed you had  
multiple mgsnode params).  Some params will be added multiple times,  
so if you want to change them, you need to erase all the params, and  
start over.

3.  Re-run a write-conf on your MGS node.  Something like this:
    "tunefs.lustre --writeconf --fsname=lfs --mdt --mgs \
   --param mdt.group_upcall=/usr/sbin/l_getgroups \
   --param lov.stripesize=2M \
   --param lov.stripecount=2 /dev/sda"

4.  Start the MGS/MDS node

5.  Start the OSTs (if you care about device ordering, start them one  
at a time in index order.  We like to have our 'lctl dl' output list  
al the OSTs in index order)

6.  mount clients


Step 3 should re-create the client config information on the MGS, and  
when you start all your OSTs, the client configs will be updated with  
the proper NIDS.  When clients mount, they ask the MGS what devices  
(OSTs) make up the filesystem, and then try to connect.  If the MGS  
is unaware of the tcp NID on the new OST (because it had the wrong  
modprobe.conf when it first registered), the clients will not know  
that the new OST has a nid on tcp0.  The write-conf on the MGS will  
fix that.

-Marc

----
D. Marc Stearman
LC Lustre Administration Lead
marc at llnl.gov
925.423.9670
Pager: 1.888.203.0641




More information about the lustre-discuss mailing list