<div dir="ltr">>> I think here it should be a colon between the two MGS nids:<div>>> mount -v -t lustre 172.16.10.1@o2ib0:192.168.10.1@tcp0:/testfs</div><div><br></div><div>That's part of my problem. The Lustre 2.x manual says that comma-delimited NIDs are on the same host, but colon-delimited NIDs are on separate hosts. Is that just for lustre.conf & mkfs.lustre, or is it for mount operations as well?</div><div><br></div><div>In this case, my MGS node has a TCP and an IB rail to accommodate the different clients, so I'd use a comma, right?</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 28, 2015 at 7:07 AM, Martin Hecht <span dir="ltr"><<a href="mailto:hecht@hlrs.de" target="_blank">hecht@hlrs.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 09/27/2015 08:59 PM, Exec Unerd wrote:<br>

>> I'm not sure if I have understood your setup correctly.<br>

> In this case, the clients are a combination of all three: some are o2ib<br>

> only, some tcp only, and some o2ib+tcp with tcp as failover.<br>

><br>

> It sounds like I need a combination of configurations, one for the OSSes<br>

> and one for each client type.<br>

><br>

> So if I used this parameter in the OST,<br>

> --mgsnode="172.16.10.1@o2ib0,192.168.10.1@tcp0"<br>

><br>

> Then configured the modprobe.d/lustre.conf appropriately on the clients<br>

> tcp: options lnet networks="tcp0(ixgbe1)"<br>

> o2ib: options lnet networks="o2ib0(ib1)"<br>

> both: options lnet networks="o2ib0(ib1),tcp0(ixgbe1)"<br>

><br>

> And use these mount parameters:<br>

> tcp: mount -v -t lustre 192.168.10.1@tcp0:/testfs /mnt/testfs<br>

> o2ib: mount -v -t lustre 172.16.10.1@o2ib0:/testfs /mnt/testfs<br>

> both: mount -v -t lustre 172.16.10.1@o2ib0,192.168.10.1@tcp0:/testfs<br>

</span>I think here it should be a colon between the two MGS nids:<br>

<br>

mount -v -t lustre 172.16.10.1@o2ib0:192.168.10.1@tcp0:/testfs<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

> /mnt/testfs<br>

><br>

> Everything should be happy?<br>

><br>

> On Thu, Sep 24, 2015 at 9:12 AM, Martin Hecht <<a href="mailto:hecht@hlrs.de">hecht@hlrs.de</a>> wrote:<br>

><br>

>> On 09/24/2015 05:33 PM, Chris Hunter wrote:<br>

>>> [...]<br>

>>>>    2. What's the best way to trace the TCP client interactions to see<br>

>>>> where<br>

>>>>    it's breaking down?<br>

>>> If lnet is running on the client, you can try "lctl ping"<br>

>>> eg) lctl ping 172.16.10.1@o2ib<br>

>>><br>

>>> I believe a lustre mount uses ipoib for initial handshake with a mds<br>

>>> o2ib interfaces. You should make sure regular ping over ipoib is<br>

>>> working before mounting lustre.<br>

>> if the client and the server is on the same network, yes, it's a good<br>

>> starting point. But it's not a prerequisite. In general you can have an<br>

>> lnet router in-between or have different ip subnets for ipoib, so you<br>

>> can't ping on the ipoib layer, but you can still lctl ping the whole<br>

>> path (although you could verify that you can ip ping to the next hop at<br>

>> least).<br>

>><br>

>> We also have a case in which we tried to block ipoib completely with<br>

>> iptables, but we still could lctl ping, even after rebooting the host<br>

>> and ensuring that the firewall was up before loading the lnet module.<br>

>> So, I doubt that ipoib is needed at all for establishing the o2ib<br>

>> connection.<br>

>><br>

>><br>

<br>

<br>

</div></div></blockquote></div><br></div>