[Lustre-discuss] Failure to connect to some OST from a client machine

Bob Ball ball at umich.edu
Thu Sep 5 12:56:03 PDT 2013


We are running Lustre 2.1.6 on Scientific Linux 6.4, kernel 
2.6.32-358.11.1.el6.x86_64.  This was an upgrade from Lustre 1.8.4 on SL5.

We have had a few situations lately where a client stops talking to some 
subset of the OST (about 58 of these total on 8 OSS, nearly 500TB in 
total).  I have a couple of questions.

1. "lctl dl"  on the OSS shows a smaller count on the affected servers; 
on the client, all OSS showed UP in "lctl dl".  Today, I first tried 
rebooting this OSS, but that did not change the situation.  I ended up 
rebooting the client before I could get full connectivity.  Is there any 
way from the client to get the reconnect, short of rebooting that client?

2. It used to be the case under Lustre 1.8.4 that I could run "lfs df 
-h" on the client, and see all OST, even those where the connection was 
not working, for whatever reason.  That is no longer the case, now the 
lfs command stops at the first, non-talking OST. This seems more like a 
bug than a feature.  Is there some other way to see a list of 
non-communicating OST on a client?

Thanks in advance for any help offered.

bob






More information about the lustre-discuss mailing list