[Lustre-discuss] Lustre and disk tuning

Dan dan at nerp.net
Fri Jan 18 16:45:13 PST 2008


Greetings all,

    I'm looking for some advice on improving disk performance and
understanding what Lustre is doing with it.  Right now I have a ~28 TB
OSS with 4 OSTs on it.  There are 4 clients using Lustre native - no
NFS.  If I write to the lustre volume from the clients I get odd
behavior.  Typically the writes have a long pause before any data
starts hitting the disks.  Then 2 or 3 of the clients will write
happily but one or two will not.  Eventually Lustre will pump out a
number of I/O related errors such as "slow i_mutex 165 seconds, slow
direct_io 32 seconds" and so on.  Next the clients that couldn't write
will catch up and pass the clients that could write.  At some point (5
minutes or so) the jobs start failing without any errors.  New jobs
can be started after these fail and the pattern repeats.  Write speeds
are low, around 22 MB/sec per client, the disks shouldn't have any
problem handling 4 writes at this speed!!  This did work using NFS.

    When these disks were formated with XFS I/O was fast.  No problems at
all writing 475 MB/sec sustained per RAID controller (locally, not via
NFS).  No delays.  After configuring for Lustre the peak sustained
write (locally) is 230 MB/sec.  It will write for about 2 minutes
before logging about slow I/O.  This is without any clients connected.

So far I've done the following:

1.  Recompiled SCSI driver for RAID controller to use 1 MB blocks (from
256k).
2.  Adjusted MDS, OST threads
3.  Tried all I/O schedulers
4.  Tried all possible settings on RAID controllers for Caching and
read-ahead.
5.  Some minor stuff I forgot about!

Nothing makes a difference - same results under each configuration except
for schedulers.  When running the deadline scheduler the writes fail
faster and have delays around 30 seconds.  With all others the delays
range from 100 to 500 seconds.

The system has 4 cores and 4 GB of memory with 4 7 TB OSTs.  The disks are
in RAID 6 split between two controllers with 2 GB cache each.  One
controller has the MGS/MDT on it.  When running top it indicates 2/3 to
3/4 of memory utilized and 25% CPU utilization normally.

Suggestions?

Thank you,

Dan




More information about the lustre-discuss mailing list