[lustre-discuss] Multiple MGS interfaces config

Exec Unerd execunerd at gmail.com
Mon Sep 28 16:18:24 PDT 2015


>> I think here it should be a colon between the two MGS nids:
>> mount -v -t lustre 172.16.10.1 at o2ib0:192.168.10.1 at tcp0:/testfs

That's part of my problem. The Lustre 2.x manual says that comma-delimited
NIDs are on the same host, but colon-delimited NIDs are on separate hosts.
Is that just for lustre.conf & mkfs.lustre, or is it for mount operations
as well?

In this case, my MGS node has a TCP and an IB rail to accommodate the
different clients, so I'd use a comma, right?

On Mon, Sep 28, 2015 at 7:07 AM, Martin Hecht <hecht at hlrs.de> wrote:

> On 09/27/2015 08:59 PM, Exec Unerd wrote:
> >> I'm not sure if I have understood your setup correctly.
> > In this case, the clients are a combination of all three: some are o2ib
> > only, some tcp only, and some o2ib+tcp with tcp as failover.
> >
> > It sounds like I need a combination of configurations, one for the OSSes
> > and one for each client type.
> >
> > So if I used this parameter in the OST,
> > --mgsnode="172.16.10.1 at o2ib0,192.168.10.1 at tcp0"
> >
> > Then configured the modprobe.d/lustre.conf appropriately on the clients
> > tcp: options lnet networks="tcp0(ixgbe1)"
> > o2ib: options lnet networks="o2ib0(ib1)"
> > both: options lnet networks="o2ib0(ib1),tcp0(ixgbe1)"
> >
> > And use these mount parameters:
> > tcp: mount -v -t lustre 192.168.10.1 at tcp0:/testfs /mnt/testfs
> > o2ib: mount -v -t lustre 172.16.10.1 at o2ib0:/testfs /mnt/testfs
> > both: mount -v -t lustre 172.16.10.1 at o2ib0,192.168.10.1 at tcp0:/testfs
> I think here it should be a colon between the two MGS nids:
>
> mount -v -t lustre 172.16.10.1 at o2ib0:192.168.10.1 at tcp0:/testfs
>
>
> > /mnt/testfs
> >
> > Everything should be happy?
> >
> > On Thu, Sep 24, 2015 at 9:12 AM, Martin Hecht <hecht at hlrs.de> wrote:
> >
> >> On 09/24/2015 05:33 PM, Chris Hunter wrote:
> >>> [...]
> >>>>    2. What's the best way to trace the TCP client interactions to see
> >>>> where
> >>>>    it's breaking down?
> >>> If lnet is running on the client, you can try "lctl ping"
> >>> eg) lctl ping 172.16.10.1 at o2ib
> >>>
> >>> I believe a lustre mount uses ipoib for initial handshake with a mds
> >>> o2ib interfaces. You should make sure regular ping over ipoib is
> >>> working before mounting lustre.
> >> if the client and the server is on the same network, yes, it's a good
> >> starting point. But it's not a prerequisite. In general you can have an
> >> lnet router in-between or have different ip subnets for ipoib, so you
> >> can't ping on the ipoib layer, but you can still lctl ping the whole
> >> path (although you could verify that you can ip ping to the next hop at
> >> least).
> >>
> >> We also have a case in which we tried to block ipoib completely with
> >> iptables, but we still could lctl ping, even after rebooting the host
> >> and ensuring that the firewall was up before loading the lnet module.
> >> So, I doubt that ipoib is needed at all for establishing the o2ib
> >> connection.
> >>
> >>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150928/72df445d/attachment.htm>


More information about the lustre-discuss mailing list