[lustre-discuss] "Not on preferred path" error

Bob Ball ball at umich.edu
Tue Sep 20 10:28:33 PDT 2016


Stabbing in the dark, but this sounds like a multipath problem. Perhaps 
you have 2 or more paths to the storage, and one or more of them is down 
for some reason, perhaps the hardware itself, perhaps a cable is 
pulled....  You could look for LEDs in a bad state.

I always find it instructive to reboot such a system and watch what 
comes up on the console during the startup.

bob

On 9/20/2016 12:29 PM, Joe Landman wrote:
> On 09/20/2016 12:21 PM, Lewis Hyatt wrote:
>
>> We do not know if it's related, but this same OSS is in a very bad
>> state, with very high load average (200), very high I/O wait time, and
>> taking many seconds to respond to each read request, making the array
>> more or less unusable. That's the problem we are trying to fix.
>
> This sounds like a storage system failure.  Queuing up of IOs to drive 
> the load to 200 usually means something is broken elsewhere in the 
> stack at a lower level.  Not always ... sometimes you have users who 
> like to write several million/billion small ( < 100 byte ) files.
>
> What does dmesg report?  Try to do a pastebin/gist of it, and point it 
> to the list.
>
> Things that come to mind are
>
> a) offlined RAID (most likely):  This would explain the user load, and 
> all sorts of strange messages about block devices and file systems in 
> the logs
>
> b) A user DoS against the storage: usually someone writing many tiny 
> files.
>
> There are other possibilities, but these seem more likely.
>
>
>



More information about the lustre-discuss mailing list