[Lustre-discuss] MDS hangs with OFED

Kevin Hildebrand kevin at umd.edu
Thu Mar 17 10:54:22 PDT 2011


We've been seeing occasional hangs on our MDS and I'd like to see if 
anyone else is seeing this or can provide suggestions on where to look.
This might not even be a Lustre problem at all.

We're running Lustre 1.8.4 with OFED 1.5.2, and kernel version 
2.6.18-194.3.1.el5_lustre.1.8.4.

The problem is that at some point it appears that something in the IB 
stack is going out to lunch- pings to the IPoIB interface time out, and 
anything that touches IB (perfquery, etc) goes into a hard hang and cannot 
be killed.

The only solution to the problem once it occurs is to power-cycle the 
machine, as shutdown/reboot hang as well.

>From what I can see, the first abnormal entries in the system logs on 
the MDS are messages showing that connections to the OSSes are timing out.

Any insight would be appreciated.

Thanks,

Kevin



More information about the lustre-discuss mailing list