[Lustre-discuss] Lustre and disk tuning
Bernd Schubert
bs at q-leap.de
Mon Jan 21 02:31:33 PST 2008
Hello Dan,
On Saturday 19 January 2008 01:45:13 Dan wrote:
> Greetings all,
>
> I'm looking for some advice on improving disk performance and
> understanding what Lustre is doing with it. Right now I have a ~28 TB
> OSS with 4 OSTs on it. There are 4 clients using Lustre native - no
> NFS. If I write to the lustre volume from the clients I get odd
> behavior. Typically the writes have a long pause before any data
> starts hitting the disks. Then 2 or 3 of the clients will write
> happily but one or two will not. Eventually Lustre will pump out a
> number of I/O related errors such as "slow i_mutex 165 seconds, slow
> direct_io 32 seconds" and so on. Next the clients that couldn't write
> will catch up and pass the clients that could write. At some point (5
> minutes or so) the jobs start failing without any errors. New jobs
> can be started after these fail and the pattern repeats. Write speeds
> are low, around 22 MB/sec per client, the disks shouldn't have any
> problem handling 4 writes at this speed!! This did work using NFS.
>
> When these disks were formated with XFS I/O was fast. No problems at
> all writing 475 MB/sec sustained per RAID controller (locally, not via
> NFS). No delays. After configuring for Lustre the peak sustained
> write (locally) is 230 MB/sec. It will write for about 2 minutes
> before logging about slow I/O. This is without any clients connected.
>
> So far I've done the following:
>
> 1. Recompiled SCSI driver for RAID controller to use 1 MB blocks (from
> 256k).
> 2. Adjusted MDS, OST threads
> 3. Tried all I/O schedulers
> 4. Tried all possible settings on RAID controllers for Caching and
> read-ahead.
> 5. Some minor stuff I forgot about!
>
> Nothing makes a difference - same results under each configuration except
> for schedulers. When running the deadline scheduler the writes fail
> faster and have delays around 30 seconds. With all others the delays
> range from 100 to 500 seconds.
>
> The system has 4 cores and 4 GB of memory with 4 7 TB OSTs. The disks are
> in RAID 6 split between two controllers with 2 GB cache each. One
> controller has the MGS/MDT on it. When running top it indicates 2/3 to
> 3/4 of memory utilized and 25% CPU utilization normally.
>
> Suggestions?
>
we are usually benchmarking with ldiskfs first, so to figure out what we
should get, we use ldiskfs in comparison to xfs.
mount -t ldiskfs -omballoc,extents /dev/{device name} /{favorite mount}
Now benchmark it and compare it to xfs. You may also want to play with
additional options as "data=writeback".
It also would be helpful if we would know which lustre version you are using.
E.g. in lustre-1.4 mballoc and extents are not enabled by default, so its
almost pure ext3, which is terribly slow compared to xfs.
Cheers,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
More information about the lustre-discuss
mailing list