[Lustre-discuss] Getting weird disk errors, no apparent impact

Wojciech Turek wjt27 at cam.ac.uk
Fri Aug 13 15:33:00 PDT 2010


Hi,

I don't think you should use rdac path checker in your multipath.conf. I
would suggest to use tur pathchecker

path_checker            tur

Bes gerads,

Wojciech

On 13 August 2010 16:51, David Noriega <tsk133 at my.utsa.edu> wrote:

> We have three Sun StorageTek 2150, one connected to the metadata
> server and two crossconnected to the two data storage nodes. They are
> connected via fiber using the qla2xxx driver that comes with CentOS
> 5.5.  The multipath daemon has the following config:
>
> defaults {
>        udev_dir                /dev
>        polling_interval        10
>        selector                "round-robin 0"
>        path_grouping_policy    multibus
>        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
>        prio_callout "/sbin/mpath_prio_rdac /dev/%n"
>        path_checker            rdac
>        rr_min_io               100
>        max_fds                 8192
>        rr_weight               priorities
>        failback                immediate
>        no_path_retry           fail
>        user_friendly_names     yes
> }
>
> Comment out from multipath.conf file:
>
> blacklist {
>        devnode "*"
> }
>
>
> On Fri, Aug 13, 2010 at 4:31 AM, Wojciech Turek <wjt27 at cam.ac.uk> wrote:
> > Hi David,
> >
> > I have seen simmilar errors given out by some storage arrays. There were
> > caused by arrays exporting volumes via more then a single path without
> multi
> > path driver installed or configured properly. Some times the array
> > controllers requires a special driver to be installed on Linux host (for
> > example RDAC mpp driver) to properly present and handle configured
> volumes
> > in the OS. What sort of disk raid array are you using?
> >
> > Best gerads,
> >
> > Wojciech
> >
> > On 12 August 2010 17:58, David Noriega <tsk133 at my.utsa.edu> wrote:
> >>
> >> We just setup a lustre system, and all looks good, but there is this
> >> nagging error thats floating about. When I reboot any of the nodes, be
> >> it a OSS or MDS, I will get this:
> >>
> >> [root at meta1 ~]# dmesg | grep sdc
> >> sdc : very big device. try to use READ CAPACITY(16).
> >> SCSI device sdc: 4878622720 512-byte hdwr sectors (2497855 MB)
> >> sdc: Write Protect is off
> >> sdc: Mode Sense: 77 00 10 08
> >> SCSI device sdc: drive cache: write back w/ FUA
> >> sdc : very big device. try to use READ CAPACITY(16).
> >> SCSI device sdc: 4878622720 512-byte hdwr sectors (2497855 MB)
> >> sdc: Write Protect is off
> >> sdc: Mode Sense: 77 00 10 08
> >> SCSI device sdc: drive cache: write back w/ FUA
> >>  sdc:end_request: I/O error, dev sdc, sector 0
> >> Buffer I/O error on device sdc, logical block 0
> >> end_request: I/O error, dev sdc, sector 0
> >>
> >> This doesn't seem to affect anything. fdisk -l doesn't even report the
> >> device. The same(thought of course different block device sdd, sde, on
> >> the OSSs), happens on all the nodes.
> >>
> >> If I run pvdisplay or lvdisplay, I'll get this:
> >> /dev/sdc: read failed after 0 of 4096 at 0: Input/output error
> >>
> >> Any ideas?
> >> David
> >> --
> >> Personally, I liked the university. They gave us money and facilities,
> >> we didn't have to produce anything! You've never been out of college!
> >> You don't know what it's like out there! I've worked in the private
> >> sector. They expect results. -Ray Ghostbusters
> >> _______________________________________________
> >> Lustre-discuss mailing list
> >> Lustre-discuss at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
> >
> > --
> > Wojciech Turek
> >
> > Senior System Architect
> >
> > High Performance Computing Service
> > University of Cambridge
> > Email: wjt27 at cam.ac.uk
> > Tel: (+)44 1223 763517
> >
>
>
>
> --
> Personally, I liked the university. They gave us money and facilities,
> we didn't have to produce anything! You've never been out of college!
> You don't know what it's like out there! I've worked in the private
> sector. They expect results. -Ray Ghostbusters
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



-- 
Wojciech Turek

Senior System Architect

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100813/b6df002a/attachment.htm>


More information about the lustre-discuss mailing list