[Lustre-discuss] Failure to connect to some OST from a client machine
ball at umich.edu
Thu Sep 5 12:56:03 PDT 2013
We are running Lustre 2.1.6 on Scientific Linux 6.4, kernel
2.6.32-358.11.1.el6.x86_64. This was an upgrade from Lustre 1.8.4 on SL5.
We have had a few situations lately where a client stops talking to some
subset of the OST (about 58 of these total on 8 OSS, nearly 500TB in
total). I have a couple of questions.
1. "lctl dl" on the OSS shows a smaller count on the affected servers;
on the client, all OSS showed UP in "lctl dl". Today, I first tried
rebooting this OSS, but that did not change the situation. I ended up
rebooting the client before I could get full connectivity. Is there any
way from the client to get the reconnect, short of rebooting that client?
2. It used to be the case under Lustre 1.8.4 that I could run "lfs df
-h" on the client, and see all OST, even those where the connection was
not working, for whatever reason. That is no longer the case, now the
lfs command stops at the first, non-talking OST. This seems more like a
bug than a feature. Is there some other way to see a list of
non-communicating OST on a client?
Thanks in advance for any help offered.
More information about the lustre-discuss