[Lustre-discuss] Errors in output from sgpdd-survey (sgp_dd.c Cannot allocate memory)
Kevin Van Maren
kevin.van.maren at oracle.com
Tue Dec 14 09:38:23 PST 2010
Yep, this is a common problem. I've never bothered to figure out why
memory can't be allocated,
although as you note the issue is in sgp_dd, not in the iokit scripts.
Could be a resource limit of
some sort (pinned pages?). If you have time to dig into it, I'm sure
many people would appreciate it.
One thing to note is that Lustre limits itself to 512 total threads per
server. So there are never more
than that outstanding IOs when running Lustre, although additional
client requests can be queued and
processed, which is why higher crg/thread values are interesting. If
you limit the sgpdd_survey
total thread count, you should not have these failures (note that 1536
threads has one
failing write process while 3072 has 140; perhaps you could have sgp_dd
retry the allocation).
Kevin
Heald, Nathan T. wrote:
> Hi everyone,
> I have been running sgpdd-survey on some DDN 9550's and am getting some
> errors. I'm using what I believe to be the latest version of the I/O Kit
> (lustre-iokit-1.2-200709210921). I've got 4 OSSes attached and run
> sgpdd-survey against all the disk from each host one at a time. Each host is
> getting these errors, but not identically. I've found several threads on the
> mailing list with people reporting this same error but there are no
> resolutions posted. One post suggested a modification to the flags for
> "sg_readcap" in the script could resolve these errors, but making the
> changes did not seem to fix the issue. It looks like sgp_dd is having
> intermittent problems:
>
> 16384+0 records out
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
>
>
> Output from sgpdd-survey:
>
> Wed Dec 1 10:55:55 EST 2010 sgpdd-survey on /dev/sdp /dev/sdo /dev/sdn
> /dev/sdw /dev/sdv /dev/sdu /dev/sdt /dev/sds /dev/sdy /dev/sdr /dev/sdx
> /dev/sdq from oss1
> ...
> total_size 100663296K rsz 1024 crg 384 thr 768 write 388.20 MB/s 384
> x 1.01 = 388.18 MB/s read 387.16 MB/s 384 x 1.01 = 388.18 MB/s
> total_size 100663296K rsz 1024 crg 384 thr 1536 write 1 failed read
> 385.72 MB/s 384 x 1.01 = 388.18 MB/s
> total_size 100663296K rsz 1024 crg 384 thr 3072 write 140 failed read 121
> failed
> total_size 100663296K rsz 1024 crg 384 thr 6144 ENOMEM
> total_size 100663296K rsz 1024 crg 768 thr 768 write 1 failed read
> 387.28 MB/s 768 x 0.51 = 388.18 MB/s
> total_size 100663296K rsz 1024 crg 768 thr 1536 write 388.23 MB/s 768
> x 0.51 = 388.18 MB/s read 386.76 MB/s 768 x 0.51 = 388.18 MB/s
> total_size 100663296K rsz 1024 crg 768 thr 3072 write 42 failed read 31
> failed
> total_size 100663296K rsz 1024 crg 768 thr 6144 ENOMEM
> total_size 100663296K rsz 1024 crg 768 thr 12288 ENOMEM
> ...
>
> Any suggestions are welcome.
>
> Thanks,
> -Nathan
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
More information about the lustre-discuss
mailing list