[Lustre-discuss] Large scale obdfilter-survey issues

Fri Oct 3 12:45:15 PDT 2008

Has anyone run obdfilter-survey with large object/thread counts and can
share their experiences? I having some odd issues and am trying to
determine if it is a setup issue on my side, or something I should file
a bug on with Sun.

For this test, I've got a Dell 1950 dual socket, quad core 2.3 GHz Xeon
with 6 MB of cache, 16 GB of memory, and an DDR IB connection to my
storage. There are 7 LUNs on the storage to be driven by this OSS,
though the problem shows up with fewer OSTs under test.

Lustre version is 1.6.5 + patches (echo_client fix, sd_iostats fix)

So, I'm trying to write out 8 GB per OST to amortize start up costs and
minimize cache effects. Things are fine until I hit high thread and
object counts:

ost  3 sz 50331648K rsz 1024 obj  768 thr 1536 write 1792.53 [ XXX, XXX] read 1788.48 [ XXX, XXX]
ost  3 sz 50331648K rsz 1024 obj 1536 thr 1536 write 2972.45       SHORT read 5376.91       SHORT 

As much as I love those numbers, I am quite certain that I'm not pushing
~1800 MB/s through a single DDR Infiniband port, let alone ~3000-5300
MB/s.

The details file for the 3 OST, 512 object/OST, 1 thread/obj run shows
no status reports from the run. The 256 object/OST, 2 thread/obj run did
have some status lines.

=============> write widow-oss1b1:lusintjo-OST0008_ecc
Print status every 1 seconds
--threads: starting 512 threads on device 9 running test_brw 32 wx q 256 1t5598
=============> write widow-oss1b1:lusintjo-OST0008_ecc
Print status every 1 seconds
--threads: starting 512 threads on device 10 running test_brw 32 wx q 256 1t3561
=============> write widow-oss1b1:lusintjo-OST0008_ecc
Print status every 1 seconds
--threads: starting 512 threads on device 11 running test_brw 32 wx q 256 1t1525
=============> write global

Nothing showed up in dmesg during those runs.

After a failed run with 7 OSTs on the OSS, at 512 objs/512 threads:

ost  7 sz 117440512K rsz 1024 obj 3584 thr 3584 write 3606.52 SHORT read 12968.83 SHORT 

I found the following in dmesg:
Lustre: 24000:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) lusintjo-OST0010: slow journal start 30s
Lustre: 24000:0:(filter_io_26.c:717:filter_commitrw_write()) lusintjo-OST0010: slow brw_start 30s
Lustre: 24178:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) lusintjo-OST0000: slow journal start 30s
Lustre: 24178:0:(filter_io_26.c:717:filter_commitrw_write()) lusintjo-OST0000: slow brw_start 30s
Lustre: 22247:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) lusintjo-OST0004: slow journal start 30s
Lustre: 22247:0:(lustre_fsfilt.h:246:fsfilt_brw_start_log()) Skipped 3 previous similar messages
Lustre: 22247:0:(filter_io_26.c:717:filter_commitrw_write()) lusintjo-OST0004: slow brw_start 30s
Lustre: 22247:0:(filter_io_26.c:717:filter_commitrw_write()) Skipped 3 previous similar messages

The Lustre tunables have not been changed from their defaults, so that
could be contributing to this. Does anyone have any experiences that
could shed more light on this? What other information can I provide that
would be helpful?

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office