[Lustre-discuss] LustreError: 11-0: an error occurred while communicating with 192.168.16.24 at o2ib. The ost_connect operation failed with -19

Kevin Van Maren Kevin.Vanmaren at Sun.COM
Wed Mar 25 09:12:06 PDT 2009


Dennis,

You haven't provided enough context for people to help.

What have you done to determine if the IB fabric is working properly?

What are hostnames and NIDs for the 10 servers (lctl list_nids)?
Which OSTs are on which servers?

OST4 is on a machine at 192.168.16.23
What machine is 192.168.16.24?  Is that the OST4 failover partner?

You have a client at 192.168.16.1?

Kevin


Dennis Nelson wrote:
>
> Hi,
>
> I have encountered an issue with Lustre that has happened a couple of 
> times
> now.  I am beginning to suspect an issue with the IB fabric but wanted to
> reach out to the list to confirm my suspicions.  The odd part is that even
> when the MDS complains that it cannot connect to a given ost, lctl ping to
> the OSS that owns the OST works without an issue.  Also, the OSS in 
> question
> has other OSTs which, in the latest case, have not reported any errors.
>
> I have attached a file with the errors that I encountered from the MDS.  I
> am running Lustre 1.6.6 with a a pair of MDSs and 8 OSS and 28 OSTs spread
> across the the 8 OSSs.  I am using IB DDR interconnects between all 
> systems.
>
> Thanks,
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list