[Lustre-discuss] 1.8.4 and write-through cache

Philippe Weill philippe.Weill at latmos.ipsl.fr
Mon Sep 13 03:04:34 PDT 2010


Le 13/09/2010 11:31, Stu Midgley a écrit :
> Afternoon
>
> I upgraded our oss's from 1.8.3 to 1.8.4 on Saturday (due to
> https://bugzilla.lustre.org/show_bug.cgi?id=22755) and suffered a
> great deal of pain.
>
> We have 30 oss's of multiple vintages.  The basic difference between them is
>
>    * md on first 20 nodes
>    * 3ware 9650SE ML12 on last 10 nodes
>
> After the upgrade to 1.8.4 we were seeing terrible throughput on the
> nodes with 3ware cards (and only the nodes with 3ware cards).  This
> was typified by see the block device being 100% utilised (iostat),
> doing about 100r/s and 400kb/s and all the ost_io threads in D state
> (no writes).  They would be in this state for 10mins and then suddenly
> awake and start pushing data again.  1-2 mins later, they would lock
> up again.
>
> The oss's were dumping stacks all over the place, crawling along and
> generally making our lustrefs unuseable.
>
> After trying different kernels, raid card drivers, changing write back
> policy on the raid cards etc. the solution was to
>
>      lctl set_param obdfilter.*.writethrough_cache_enable=0
>      lctl set_param obdfilter.*.read_cache_enable=0
>
> on all the nodes with the 3ware cards.
>
> Has anyone else seen this?  I am completely baffled as to why it only
> affects our nodes with 3ware cards.
>
> These nodes were working very well under 1.8.3...
>
>

we have the same problem here but we're not on 3ware

qla2462 and xiratex  F5404E 4Gb FC-SAS/SATA-II RAID on 1.8.4

on 1.8.3 this also occure at start but after it's OK


-- 
  Weill Philippe -  Administrateur Systeme et Reseaux
  CNRS/UPMC/IPSL   LATMOS (UMR 8190)



More information about the lustre-discuss mailing list