[Lustre-discuss] 1.8.4 and write-through cache
Kevin Van Maren
kevin.van.maren at oracle.com
Thu Sep 16 14:57:45 PDT 2010
Stu Midgley wrote:
> Afternoon
>
> I upgraded our oss's from 1.8.3 to 1.8.4 on Saturday (due to
> https://bugzilla.lustre.org/show_bug.cgi?id=22755) and suffered a
> great deal of pain.
>
> We have 30 oss's of multiple vintages. The basic difference between them is
>
> * md on first 20 nodes
> * 3ware 9650SE ML12 on last 10 nodes
>
> After the upgrade to 1.8.4 we were seeing terrible throughput on the
> nodes with 3ware cards (and only the nodes with 3ware cards). This
> was typified by see the block device being 100% utilised (iostat),
> doing about 100r/s and 400kb/s and all the ost_io threads in D state
> (no writes). They would be in this state for 10mins and then suddenly
> awake and start pushing data again. 1-2 mins later, they would lock
> up again.
>
> The oss's were dumping stacks all over the place, crawling along and
> generally making our lustrefs unuseable.
>
Would you post a few of the stack traces? Presumably these were driven
by watchdog timeouts,
but it would help to know where they were getting stuck.
> After trying different kernels, raid card drivers, changing write back
> policy on the raid cards etc. the solution was to
>
> lctl set_param obdfilter.*.writethrough_cache_enable=0
> lctl set_param obdfilter.*.read_cache_enable=0
>
> on all the nodes with the 3ware cards.
>
> Has anyone else seen this? I am completely baffled as to why it only
> affects our nodes with 3ware cards.
>
> These nodes were working very well under 1.8.3...
>
>
>
More information about the lustre-discuss
mailing list