[Lustre-discuss] Balancing I/O Load

Fri Nov 30 03:16:27 PST 2007

Ok.  I guess this was something we were doing in our tests.   Now  
running over 100 iozone threads across a random mix of IB (o2ib) and  
tcp (ethernet<->IPoIB) clients.   All the OSTs on all the OSSs are  
going at 100%.   Very nice!   And interactive response is still very  
good.  :)

Charlie Taylor
UF HPC Center

On Nov 29, 2007, at 12:55 PM, Charles Taylor wrote:

>
> We are seeing some disturbing (probably due to our ignorance)
> behavior from lustre 1.6.3 right now.     We have 8 OSSs with 3 OSTs
> per OSS (24 physical LUNs).   We just created a brand new lustre file
> system across this configuration using the default mkfs.lustre
> formatting options.    We have this file system mounted across 400
> clients.
>
> At the moment, we have 63 IOzone threads running on roughly 60
> different clients.    The balance among the OSSs is terrible and
> within each OSS, the balance across the OSTs (luns) is even worse.
> We have one OSS with a load of 100 and another that is not being
> touched.    On several of the OSSs, only one OST (luns) is being used
> while the other two are ignored entirely.
>
> This is really just a bnuch of random I/O (both large and small
> block) from a bunch of random clients (as will occur in real-life)
> and our lustre implementation is not making very good use of the
> available resources.   Can this be tuned?    What are we doing
> wrong?    The 1.6 operations manual (version 1.9) does not say a lot
> about options for balancing the work load among OSSs/OSTs.
> Shouldn't lustre be doing a better job (by default) of distributing
> the workload?
>
> Charlie Taylor
> UF HPC Center
>
> FWIW, the servers are dual-processor, dual-core Opterons (275s) with
> 4GB RAM each.   They are running CentOS 5 w/ a
> 2.6.18-8.1.14.el5Lustre (patched lustre, smp kernel) and the deadline
> I/O scheduler.   If it matters, our OSTs are atop LVM2 volumes (for
> management).    The back-end storage is all Fibre-channel RAID
> (Xyratex).    We have tuned the servers and know that we can get
> roughly 500MB/s per server across a striped *local* file system.
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss