[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

Ms. Megan Larko dobsonunit at gmail.com
Thu Jul 31 11:12:18 PDT 2008


>From megan:
Comments in-line.

== 2 of 2 ==
Date: Wed, Jul 30 2008 7:12 am
From: "Brian J. Murrell"

On Tue, 2008-07-29 at 15:43 -0400, Ms. Megan Larko wrote:
> Hi,
>
> Additional info.
>
> If I use "scsidevs=/dev/sdj" in /usr/bin/sgpdd-survey in place of the
> /dev/sg16

Yes, this is the correct syntax.

> I receive the following result:
> Tue Jul 29 15:40:47 EDT 2008 sgpdd-survey on /dev/sdj from oss4.crew.local
> total_size 17487872K rsz 1024 crg     1 thr     4 write 1 failed read 1 failed
> total_size 17487872K rsz 1024 crg     1 thr     8 write 1 failed read 1 failed
> total_size 17487872K rsz 1024 crg     1 thr    16 write 1 failed read 1 failed
> total_size 17487872K rsz 1024 crg     2 thr     4 write 2 failed read 2 failed
> total_size 17487872K rsz 1024 crg     2 thr     8 write 2 failed read 2 failed
> total_size 17487872K rsz 1024 crg     2 thr    16 write 2 failed read 2 failed
> total_size 17487872K rsz 1024 crg     2 thr    32 write 2 failed read 2 failed
> total_size 17487872K rsz 1024 crg     4 thr     4 write 4 failed read 4 failed
> total_size 17487872K rsz 1024 crg     4 thr     8 write 4 failed read 4 failed
> total_size 17487872K rsz 1024 crg     4 thr    16 write 4 failed read 4 failed
> total_size 17487872K rsz 1024 crg     4 thr    32 write 4 failed read 4 failed
> total_size 17487872K rsz 1024 crg     4 thr    64 write 4 failed read 4 failed
> total_size 17487872K rsz 1024 crg     8 thr     8 write 8 failed read 8 failed
> total_size 17487872K rsz 1024 crg     8 thr    16 write 8 failed read 8 failed
> total_size 17487872K rsz 1024 crg     8 thr    32 write 8 failed read 8 failed
> total_size 17487872K rsz 1024 crg     8 thr    64 write 8 failed read 8 failed
> total_size 17487872K rsz 1024 crg    16 thr    16 write 16 failed read 16 failed
> total_size 17487872K rsz 1024 crg    16 thr    32 write 16 failed read 16 failed
> total_size 17487872K rsz 1024 crg    16 thr    64 write 16 failed read 16 failed
> total_size 17487872K rsz 1024 crg    32 thr    32 write 32 failed read 32 failed
> total_size 17487872K rsz 1024 crg    32 thr    64 write 32 failed read 32 failed
> total_size 17487872K rsz 1024 crg    64 thr    64 write 64 failed read 64 failed
>
> All writes and reads fail but it indicates that it found the device....

Indeed.  So the question is, why are the reads and writes failing.

Do you have any files in /tmp named:

/tmp/sgpdd_survey_$(date)_$(uname -n).detail

If so, can you paste one here?

megan:  I am attaching the file from
/tmp/sgpdd_survey_2008-07-29 at 15:40_oss4.crew.local.detail
The complaint seems to be that the memory cannot be accessed.

Alternatively you can try using sgp_dd to read a device.  The following
should work:

# sgp_dd /dev/sg16 /dev/null count=10 bs=512 time=1

and paste the result here.

megan:  Pasting result--
[root at oss4 ~]# sgp_dd of=/dev/sg16 if=/dev/null count=10 bs=512 time=1
time to transfer data was 0.000121 secs
  remaining block count=10
0+0 records in
0+0 records out

Note that a "cat /proc/meminfo" shows 16Gb RAM on the machine oss4.
[root at oss4 ~]# cat /proc/meminfo
MemTotal:     16439328 kB
MemFree:      16101332 kB
Buffers:         32260 kB
Cached:         205820 kB
  ---snip---

BTW I am running iozone  v. 3.283 on the OS drive, a RAID6 JBOD disk
formatted ext3 and one of our existing Lustre disks and the lustre
system is doing well under iozone.

Thanks,
megan

b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sgpdd_survey_2008-07-29 at 15:40_oss4.crew.local.detail
Type: application/octet-stream
Size: 33332 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20080731/a61eb44d/attachment.obj>


More information about the lustre-discuss mailing list