[lustre-discuss] "Not on preferred path" error

Joe Landman landman at scalableinformatics.com
Tue Sep 20 09:29:33 PDT 2016


On 09/20/2016 12:21 PM, Lewis Hyatt wrote:

> We do not know if it's related, but this same OSS is in a very bad
> state, with very high load average (200), very high I/O wait time, and
> taking many seconds to respond to each read request, making the array
> more or less unusable. That's the problem we are trying to fix.

This sounds like a storage system failure.  Queuing up of IOs to drive 
the load to 200 usually means something is broken elsewhere in the stack 
at a lower level.  Not always ... sometimes you have users who like to 
write several million/billion small ( < 100 byte ) files.

What does dmesg report?  Try to do a pastebin/gist of it, and point it 
to the list.

Things that come to mind are

a) offlined RAID (most likely):  This would explain the user load, and 
all sorts of strange messages about block devices and file systems in 
the logs

b) A user DoS against the storage: usually someone writing many tiny files.

There are other possibilities, but these seem more likely.



-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
e: landman at scalableinformatics.com
w: http://scalableinformatics.com
t: @scalableinfo
p: +1 734 786 8423 x121
c: +1 734 612 4615


More information about the lustre-discuss mailing list