[Lustre-discuss] 1.8.1(-ish) client vs. 1.6.7.2 server

Liang Zhen Zhen.Liang at Sun.COM
Tue Jul 21 09:42:26 PDT 2009


Robin,

These messages should be harmless, 1.8.1 is using new o2iblnd message 
protocol, so there is a version negotiation if o2iblnd version of client 
is older,  is there any other o2ib error messages like " Deleting 
messages for xxx.xxx.xxx.xxx at o2b: connection failed" when you see IO 
failure? Anyway, if you got more complaint from o2ib except these 
information, could you please post them on the bug you filed.

Thanks
Liang

Robin Humble wrote:
> I added this to bugzilla.
>   https://bugzilla.lustre.org/show_bug.cgi?id=20227
>
> cheers,
> robin
>
> On Wed, Jul 15, 2009 at 01:09:33PM -0400, Robin Humble wrote:
>   
>> On Wed, Jul 15, 2009 at 08:46:12AM -0400, Robin Humble wrote:
>>     
>>> I get a ferocious set of error messages when I mount a 1.6.7.2
>>> filesystem on a b_release_1_8_1 client.
>>> is this expected?
>>>       
>> just to annotate the below a bit in case it's not clear... sorry -
>> should have done that in the first email :-/
>>
>> 10.8.30.244 is MGS and one MDS, 10.8.30.245 is the other MDS in the
>> failover pair. 10.8.30.201 -> 208 are OSS's (one OST per OSS), and the
>> fs is mounted in the usual failover way eg.
>>  mount -t lustre 10.8.30.244 at o2ib:10.8.30.245 at o2ib:/system /system
>>
>>     
> >from the below (and other similar logs) it kinda looks like the client
>   
>> fails and then renegotiates with all the servers.
>>
>> cheers,
>> robin
>> --
>> Dr Robin Humble, HPC Systems Analyst, NCI National Facility
>>
>>     
>>>  Lustre: 13800:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30.244 at o2ib failed: 5
>>>  Lustre: 13799:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30.244 at o2ib failed: 5
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.244 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: MGC10.8.30.244 at o2ib: Reactivating import
>>>  Lustre: 13797:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30.245 at o2ib failed: 5
>>>  Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30.245 at o2ib failed: 5
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.245 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: Client system-client has started
>>>  Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30.201 at o2ib failed: 5
>>>  ... last message repeated 17 times ...
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.201 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.202 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: 13798:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30.203 at o2ib failed: 5
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.203 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.204 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: 13797:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30.205 at o2ib failed: 5
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.205 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.206 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.207 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: 615:0:(o2iblnd_cb.c:2384:kiblnd_reconnect()) 10.8.30.208 at o2ib: retrying (version negotiation), 12, 11, queue_dep: 8, max_frag: 256, msg_size: 4096
>>>  Lustre: 13800:0:(o2iblnd_cb.c:459:kiblnd_rx_complete()) Rx from 10.8.30.208 at o2ib failed: 5
>>>
>>> looks like it succeeds in the end, but only after a struggle.
>>>
>>> I don't have any problems with 1.8.1 <-> 1.8.1 or 1.6.7.2 <-> 1.6.7.2.
>>>
>>> servers are rhel5 x86_64 2.6.18-92.1.26.el5 1.6.7.2 + bz18793 (group
>>> quota fix).
>>> client is rhel5 x86_64 patched 2.6.18-128.1.16.el5-b_release_1_8_1 from
>>> cvs 20090712131220 + bz18793 again.
>>>
>>> BTW, should I be using cvs tag v1_8_1_RC1 instead of b_release_1_8_1?
>>> I'm confused about which is closest to the final 1.8.1 :-/
>>>
>>> cheers,
>>> robin
>>> --
>>> Dr Robin Humble, HPC Systems Analyst, NCI National Facility
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>       
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>     
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list