[Lustre-discuss] Getting weird disk errors, no apparent impact

Fri Aug 13 08:51:26 PDT 2010

We have three Sun StorageTek 2150, one connected to the metadata
server and two crossconnected to the two data storage nodes. They are
connected via fiber using the qla2xxx driver that comes with CentOS
5.5.  The multipath daemon has the following config:

defaults {
        udev_dir                /dev
        polling_interval        10
        selector                "round-robin 0"
        path_grouping_policy    multibus
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout "/sbin/mpath_prio_rdac /dev/%n"
        path_checker            rdac
        rr_min_io               100
        max_fds                 8192
        rr_weight               priorities
        failback                immediate
        no_path_retry           fail
        user_friendly_names     yes
}

Comment out from multipath.conf file:

blacklist {
        devnode "*"
}

On Fri, Aug 13, 2010 at 4:31 AM, Wojciech Turek <wjt27 at cam.ac.uk> wrote:
> Hi David,
>
> I have seen simmilar errors given out by some storage arrays. There were
> caused by arrays exporting volumes via more then a single path without multi
> path driver installed or configured properly. Some times the array
> controllers requires a special driver to be installed on Linux host (for
> example RDAC mpp driver) to properly present and handle configured volumes
> in the OS. What sort of disk raid array are you using?
>
> Best gerads,
>
> Wojciech
>
> On 12 August 2010 17:58, David Noriega <tsk133 at my.utsa.edu> wrote:
>>
>> We just setup a lustre system, and all looks good, but there is this
>> nagging error thats floating about. When I reboot any of the nodes, be
>> it a OSS or MDS, I will get this:
>>
>> [root at meta1 ~]# dmesg | grep sdc
>> sdc : very big device. try to use READ CAPACITY(16).
>> SCSI device sdc: 4878622720 512-byte hdwr sectors (2497855 MB)
>> sdc: Write Protect is off
>> sdc: Mode Sense: 77 00 10 08
>> SCSI device sdc: drive cache: write back w/ FUA
>> sdc : very big device. try to use READ CAPACITY(16).
>> SCSI device sdc: 4878622720 512-byte hdwr sectors (2497855 MB)
>> sdc: Write Protect is off
>> sdc: Mode Sense: 77 00 10 08
>> SCSI device sdc: drive cache: write back w/ FUA
>>  sdc:end_request: I/O error, dev sdc, sector 0
>> Buffer I/O error on device sdc, logical block 0
>> end_request: I/O error, dev sdc, sector 0
>>
>> This doesn't seem to affect anything. fdisk -l doesn't even report the
>> device. The same(thought of course different block device sdd, sde, on
>> the OSSs), happens on all the nodes.
>>
>> If I run pvdisplay or lvdisplay, I'll get this:
>> /dev/sdc: read failed after 0 of 4096 at 0: Input/output error
>>
>> Any ideas?
>> David
>> --
>> Personally, I liked the university. They gave us money and facilities,
>> we didn't have to produce anything! You've never been out of college!
>> You don't know what it's like out there! I've worked in the private
>> sector. They expect results. -Ray Ghostbusters
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
> --
> Wojciech Turek
>
> Senior System Architect
>
> High Performance Computing Service
> University of Cambridge
> Email: wjt27 at cam.ac.uk
> Tel: (+)44 1223 763517
>

-- 
Personally, I liked the university. They gave us money and facilities,
we didn't have to produce anything! You've never been out of college!
You don't know what it's like out there! I've worked in the private
sector. They expect results. -Ray Ghostbusters