[Lustre-discuss] LBUG
Wojciech Turek
wjt27 at cam.ac.uk
Fri Nov 16 07:53:39 PST 2007
Hi,
Indeed client has disconnected from MDS. We actually see that quite
frequently during OSS failover also on other clients. Is that
indicates that during OSS failure MDS is very busy and clients <->
mds connections timeout? Is there a way to prevent from such a
situation maybe some MDS tuning?
I don't really see the reason why clients could not communicate with
MDS while only OSS is having problem.
Cheers,
Wojciech
On 16 Nov 2007, at 15:46, Oleg Drokin wrote:
> Hello!
>
> On Nov 16, 2007, at 8:43 AM, Wojciech Turek wrote:
>>>> We've seen LBUG message today. It happened during failover of one
>>>> OSS's to another one.
>>> Actually messages suggest that there was mds failover as well.
>> Can you specify which messages suggest that ? I am asking because
>> as far as I can see there was no MDS failover. We have failover
>> configured with heartbeat I can see everything stayed on the same
>> server.
>
> Nov 15 22:10:14 darwin kernel: Lustre: ddn_home-MDT0000-
> mdc-00000100cff22800: Connection restored to service ddn_home-MDT0000
> using nid 10.143.245.201 at tcp.
>
> This message means that connection was restored to your MDS.
> I cannot tell if it was indeed failover (sorry, I used wrong word),
> but I can tell this client disconnected from MDS previously and
> later reconnected to it by this message.
> I assumed since you were speaking of failovers MDS might have been
> failed over as well (due to disconnection), but this is not
> necessary the case.
>
> Bye,
> Oleg
Mr Wojciech Turek
Assistant System Manager
University of Cambridge
High Performance Computing service
email: wjt27 at cam.ac.uk
tel. +441223763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071116/53be2870/attachment.htm>
More information about the lustre-discuss
mailing list