[lustre-discuss] "Not on preferred path" error
landman at scalableinformatics.com
Tue Sep 20 10:51:08 PDT 2016
On 09/20/2016 01:39 PM, Lewis Hyatt wrote:
> Thanks very much for the suggestions. dmesg output is here:
> We don't see any disk-related stuff there, and also our GUI shows all
> the RAID arrays as being fine.
Hmmm .... I rarely trust GUIs for RAID. Do you have underlying CLI
tools you can do a sanity check with?
> If anything in there jumps out at you, I'd really appreciate your
> thoughts! We are almost certainly going to reboot the affected OSS later
> today to see how that goes.
Not seeing anything leap out other than two particular targets,
twlstr-OST000b and twlstr-OST0006, appear to be "slow". This appears to
be what is causing client evictions, lock bits, etc.
The question is, why are these two OSTs slow. What is the underlying
RAID, how many operations are queued up, etc.?
A tool we recommend for (nearly instantaneous) holistic level views on a
system is glances, which you can install via pip
pip install glances
then run it as
glances -t 1
to get a second by second view of your system. Dstat is also good.
Dumb question ... what does
report? I am assuming you aren't swapping (and don't have swap enabled
on the system, but it never hurts to ask).
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: landman at scalableinformatics.com
p: +1 734 786 8423 x121
c: +1 734 612 4615
More information about the lustre-discuss