<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

I made some more tests, and have setup a micro lustre cluster on lvm

volumes.<br>

node a: MDS<br>

node b and c: OST<br>

node a,b,c,d,e,f: clients<br>

Gigabit ethernet network.<br>

Made the optimizations: lnet.debug=0, lru_size to 10000, max_dirty_mb

to 1024<br>

<br>

The svn checkout takes 50s ( 15s on a localdisk, 25s on a local lustre

demo (with debug=0))<br>

Launching gkrellm, a single svn checkout consumes about 20% of the MDS

system cpu with about 2.4mbyte/sec ethernet communication. <br>

About 6MByte/s disk bandwidth on OST1, up to 12-16MB/s on OST2 disk

bandwidth, network bandwidth on OST is about 10 to 20 times under disk

bandwidth.<br>

<br>

I launched a compilation distributed on the 6 clients:<br>

MDS system cpu goes up to 60% system ressource (athlon 64 3500+)

12MByte/s on the ethernet, OST goes up to the same level as previous

test.<br>

<br>

How come is there so much network communications on the MDT?<br>

Why so much disk bandwidth on OSTs, is it a readahead problem?<br>

<br>

As I understood that the MDS can not be load balanced, I don't see how

lustre is scalable to thousands of clients...<br>

It looks like lustre is not made for this kind of application<br>

<br>

Best regards.<br>

<br>

Andreas Dilger a écrit :

<blockquote cite="mid:20080229194602.GL2997@webber.adilger.int"

 type="cite"><!----><br>

  <pre wrap="">Have you turned off debugging (sysctl -w lnet.debug=0)?

Have you increased the DLM lock LRU sizes?

for L in /proc/fs/lustre/ldlm/namespaces/*/lru_size; do

    echo 10000 > $L

done

In 1.6.5/1.8.0 it will be possible to use a new command to set

this kind of parameter easier:

lctl set_param ldlm.namespaces.*.lru_size=10000

  </pre>

  <blockquote type="cite">

    <pre wrap="">Is there something to tweak to lower this overhead?

Is there a specific tweak for small files?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Not really, this isn't Lustre's strongest point.

  </pre>

  <blockquote type="cite">

    <pre wrap="">Using multiple server nodes, will the performance be better?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Partly.  There can only be a single MDT per filesystem, but it can

scale quite well with multiple clients.  There can be many OSTs,

but it isn't clear whether you are IO bound.  It probably wouldn't

hurt to have a few to give you a high IOPS rate.

Note that increasing OST count also by default allows clients to

cache more dirty data (32MB/OST).  You can change this manually,

it is by default tuned for very large clusters (000's of nodes).

for C in /proc/fs/lustre/osc/*/max_dirty_mb

        echo 256 > $C

done

Similarly, in 1.6.5/1.8.0 it will be possible to do:

lctl set_param osc.*.max_dirty_mb=256

Cheers, Andreas

--

Andreas Dilger

Sr. Staff Engineer, Lustre Group

Sun Microsystems of Canada, Inc.

  </pre>

</blockquote>

<br>

</body>

</html>