[Lustre-discuss] MDS hangs with OFED
Kevin Hildebrand
kevin at umd.edu
Thu Mar 17 10:54:22 PDT 2011
We've been seeing occasional hangs on our MDS and I'd like to see if
anyone else is seeing this or can provide suggestions on where to look.
This might not even be a Lustre problem at all.
We're running Lustre 1.8.4 with OFED 1.5.2, and kernel version
2.6.18-194.3.1.el5_lustre.1.8.4.
The problem is that at some point it appears that something in the IB
stack is going out to lunch- pings to the IPoIB interface time out, and
anything that touches IB (perfquery, etc) goes into a hard hang and cannot
be killed.
The only solution to the problem once it occurs is to power-cycle the
machine, as shutdown/reboot hang as well.
>From what I can see, the first abnormal entries in the system logs on
the MDS are messages showing that connections to the OSSes are timing out.
Any insight would be appreciated.
Thanks,
Kevin
More information about the lustre-discuss
mailing list