[Lustre-discuss] Errors in output from sgpdd-survey (sgp_dd.c Cannot allocate memory)

Kevin Van Maren kevin.van.maren at oracle.com
Tue Dec 14 09:38:23 PST 2010


Yep, this is a common problem.  I've never bothered to figure out why 
memory can't be allocated,
although as you note the issue is in sgp_dd, not in the iokit scripts.  
Could be a resource limit of
some sort (pinned pages?).  If you have time to dig into it, I'm sure 
many people would appreciate it.

One thing to note is that Lustre limits itself to 512 total threads per 
server.  So there are never more
than that outstanding IOs when running Lustre, although additional 
client requests can be queued and
processed, which is why higher crg/thread values are interesting.  If 
you limit the sgpdd_survey
total thread count, you should not have these failures (note that 1536 
threads has one
failing write process while 3072 has 140; perhaps you could have sgp_dd 
retry the allocation).

Kevin


Heald, Nathan T. wrote:
> Hi everyone,
> I have been running sgpdd-survey on some DDN 9550's and am getting some
> errors. I'm using what I believe to be the latest version of the I/O Kit
> (lustre-iokit-1.2-200709210921). I've got 4 OSSes attached and run
> sgpdd-survey against all the disk from each host one at a time. Each host is
> getting these errors, but not identically. I've found several threads on the
> mailing list with people reporting this same error but there are no
> resolutions posted. One post suggested a modification to the flags for
> "sg_readcap" in the script could resolve these errors, but making the
> changes did not seem to fix the issue. It looks like sgp_dd is having
> intermittent problems:
>
> 16384+0 records out
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
> sg starting in command at "sgp_dd.c":827: Cannot allocate memory
>
>
> Output from sgpdd-survey:
>
> Wed Dec  1 10:55:55 EST 2010 sgpdd-survey on /dev/sdp /dev/sdo /dev/sdn
> /dev/sdw /dev/sdv /dev/sdu /dev/sdt /dev/sds /dev/sdy /dev/sdr /dev/sdx
> /dev/sdq  from oss1
> ... 
> total_size 100663296K rsz 1024 crg   384 thr   768 write  388.20 MB/s   384
> x   1.01 =  388.18 MB/s read  387.16 MB/s   384 x   1.01 =  388.18 MB/s
> total_size 100663296K rsz 1024 crg   384 thr  1536 write 1 failed read
> 385.72 MB/s   384 x   1.01 =  388.18 MB/s
> total_size 100663296K rsz 1024 crg   384 thr  3072 write 140 failed read 121
> failed 
> total_size 100663296K rsz 1024 crg   384 thr  6144 ENOMEM
> total_size 100663296K rsz 1024 crg   768 thr   768 write 1 failed read
> 387.28 MB/s   768 x   0.51 =  388.18 MB/s
> total_size 100663296K rsz 1024 crg   768 thr  1536 write  388.23 MB/s   768
> x   0.51 =  388.18 MB/s read  386.76 MB/s   768 x   0.51 =  388.18 MB/s
> total_size 100663296K rsz 1024 crg   768 thr  3072 write 42 failed read 31
> failed 
> total_size 100663296K rsz 1024 crg   768 thr  6144 ENOMEM
> total_size 100663296K rsz 1024 crg   768 thr 12288 ENOMEM
> ...
>
> Any suggestions are welcome.
>
> Thanks,
> -Nathan
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>   




More information about the lustre-discuss mailing list