[Lustre-discuss] 1.8.4 and write-through cache
Philippe Weill
philippe.Weill at latmos.ipsl.fr
Mon Sep 13 03:04:34 PDT 2010
Le 13/09/2010 11:31, Stu Midgley a écrit :
> Afternoon
>
> I upgraded our oss's from 1.8.3 to 1.8.4 on Saturday (due to
> https://bugzilla.lustre.org/show_bug.cgi?id=22755) and suffered a
> great deal of pain.
>
> We have 30 oss's of multiple vintages. The basic difference between them is
>
> * md on first 20 nodes
> * 3ware 9650SE ML12 on last 10 nodes
>
> After the upgrade to 1.8.4 we were seeing terrible throughput on the
> nodes with 3ware cards (and only the nodes with 3ware cards). This
> was typified by see the block device being 100% utilised (iostat),
> doing about 100r/s and 400kb/s and all the ost_io threads in D state
> (no writes). They would be in this state for 10mins and then suddenly
> awake and start pushing data again. 1-2 mins later, they would lock
> up again.
>
> The oss's were dumping stacks all over the place, crawling along and
> generally making our lustrefs unuseable.
>
> After trying different kernels, raid card drivers, changing write back
> policy on the raid cards etc. the solution was to
>
> lctl set_param obdfilter.*.writethrough_cache_enable=0
> lctl set_param obdfilter.*.read_cache_enable=0
>
> on all the nodes with the 3ware cards.
>
> Has anyone else seen this? I am completely baffled as to why it only
> affects our nodes with 3ware cards.
>
> These nodes were working very well under 1.8.3...
>
>
we have the same problem here but we're not on 3ware
qla2462 and xiratex F5404E 4Gb FC-SAS/SATA-II RAID on 1.8.4
on 1.8.3 this also occure at start but after it's OK
--
Weill Philippe - Administrateur Systeme et Reseaux
CNRS/UPMC/IPSL LATMOS (UMR 8190)
More information about the lustre-discuss
mailing list