[Lustre-discuss] Performance drop (1.6.5 vs 1.6.4.3, OFED 1.2)?

Andreas Dilger adilger at sun.com
Tue Jul 8 11:23:23 PDT 2008


On Jul 06, 2008  02:30 +0200, Andrei Maslennikov wrote:
>  Disabling checksumming certainly leads to a big performance impact on the
> client side.
>  However it looks like we still have some performance gap between 1.6.4.3and
> 1.6.5.
>  I have repeated the tests making sure that the file sizes are much larger
> than available
>  RAM on the client, to avoid any caching effects. Here is what came out:
> 
>  Single stream writing: (lmdd of=/lustre/tstfileXX bs=1M time=200 fsync=1)

client	patched 1.6.4.3	patchless 1.6.5(nocsum)	patchless 1.6.5(csum)
2.2GHz	681 MB/sec 	590 MB/sec		265 MB/sec
3.0GHz	832 MB/sec 	675 MB/sec		322 MB/sec

>  Here we see that 1.6.5.0 with fully patched client and no checksumming
> still performs worse than 1.6.4.3 with fully patched client (only
> 681 MB/sec against 832 MB/sec, almost 18% less).

Can you please check the CPU usage during these tests?  Is there still a
more CPU usage on the client or server in 1.6.5 compared to 1.6.4.3 even
with the checksumming disabled?

It is important to use something like "top" with the '1' option to list
per-cpu usage to see if a single CPU is at 100% and others are less busy,
instead of using the average across all CPUs.

Is the test with only a single thread?  Have you tried running with 2 or
more threads on the client?

>  Is there some other parameter to play with?

Do you have the same IB stack used with both the 1.6.5 and 1.6.4.3
releases?  It would be very useful to test with LNET Self Test (LST)
to see if the slowdown is related to the IB 1.3 in 1.6.5.  LST is
available in both of these releases, and details on running it are at:

http://manual.lustre.org/manual/LustreManual16_HTML/LustreIOKit.html#50446382_pgfId-1290255


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list