[Lustre-discuss] Sgpdd-survey (sgp_dd) memory allocation error

Thu Oct 29 16:00:46 PDT 2009

   I'm having a problem with sgpdd-survey on a raid array returning:

Thu Oct 29 11:19:29 MDT 2009 sgpdd-survey on /dev/sdb from lustre-oss-3
total_size  8388608K rsz 1024 crg     1 thr     1 write 1 failed read 1 
failed

    for all tests.  In addition the details file shows:

==============> total_size  8388608K rsz 1024 crg     1 thr     1
=====> write
sg starting out command at "sgp_dd.c":872: Cannot allocate memory

    An example sgp_dd call that the survey is making is:

sgp_dd if=/dev/zero of=/dev/sg1 seek=1024 thr=1 count=16777216 bs=512 
bpt=2048 time=1
sg starting out command at "sgp_dd.c":872: Cannot allocate memory

    The same command using sg_dd (sans thre arg) instead of sgp_dd works:
sg_dd if=/dev/zero of=/dev/sg1 seek=1024 count=16777216 bs=512 bpt=2048 
time=1
Reducing write to 256 blocks per loop
time to transfer data: 31.908034 secs at 269.24 MB/sec
16779008+0 records in
16777216+0 records out

   sgp_dd with a thread count of 1 and block/transaction size of 2048
won't work with a count greater than 256 on this system.

sgp_dd if=/dev/zero of=/dev/sg1 seek=1024 thr=1 count=257 bs=512 
bpt=1024 time=1
sg starting out command at "sgp_dd.c":872: Cannot allocate memory

sgp_dd if=/dev/zero of=/dev/sg1 seek=1024 thr=1 count=256 bs=512 
bpt=1024 time=1
time to transfer data was 0.000660 secs, 198.59 MB/sec
256+0 records in
256+0 records out

   The same command run on a different machine against a single 1TB disk
running the exact same OS/kernel but with 8GB of memory works just fine.
No equivalent limits.

   Further, on the machine in question sg_dd first responds with
"Reducing write to 256 blocks per copy" right off the bat.  The
same threshold for sgp_dd working or not working.  The different
machine does not generate that message with sg_dd when using a
large count size.

   Any insight as to what causes the memory allocation problem with
sgp_dd but not with sg_dd (not a ulimit issue) on this hardware but not
on others or why sg_dd can deal with that difference but sgp_dd can't ?

   For what it's worth this OST was part of a 4 OSS filesystem that
works just fine.  I'm just using it and the others to test some
application software and wanted to revisit some of the benchmarks.

   It's not a critical issue by any means, this particular benchmark
isn't useful in this case, just terminally curious.

James Robnett
NRAO/NM

Iokit version 1.2
RHEL 5.3 w/  2.6.18-128.7.1.el5 (and a lustre kernel of same version).
sg3_utils-1.25-1.el5
4GB Memory
3ware 9550SX 8-port raid controller
SATA 400GB WD disks.