[Lustre-discuss] Frequent OSS Crashes with heavy load

Wang lu wanglu at ihep.ac.cn
Wed Nov 12 09:49:49 PST 2008


Do you mean mount Lustre on a OSS server, and then do PIOS test? One client
node has only 1Gbit Network. It can not  saturat the OSS server. 



[root at boss01 /]# mount -t lustre mds01 at tcp0:/besfs /besfs
[root at boss01 /]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p1     30233896   3784000  24914084  14% /
none                   4153680         0   4153680   0% /dev/shm
/dev/cciss/c0d0p5     92702372     90176  87903148   1% /scrach
/dev/cciss/c0d0p3      2016044     35836   1877796   2% /usr/vice/cache
/dev/sda1            6728210844 1657103924 4729333456  26% /lustre/besfs/ost0
/dev/sda2            6728210844 1659522080 4726915300  26% /lustre/besfs/ost1
/dev/sdb1            6728210844 1644823840 4741613540  26% /lustre/besfs/ost2
/dev/sdb2            6728210844 1653193084 4733244296  26% /lustre/besfs/ost3
mds01 at tcp0:/besfs    53825686752 13247980384 37843027072  26% /besfs


Andreas Dilger 写:

> On Nov 12, 2008  13:48 +0000, Wang lu wrote:
>> May I ask where can I run PIOS command? I think to determine the max thread
>> number of OSS, it should be run on OSS, however, the OST directorys are
>> unwritable. Can I write to /dev/sdaX? I am confused. 
> 
> Running PIOS directly the /dev/sdX will overwrite all data there.  It should
> only be run on the disk devices before the filesystem is formatted.  You
> can run PIOS against the filesystem itself (e.g. /mnt/lustre) to just create
> regular files in the filesystem.
> 
>> Brian J. Murrell 写:
>> 
>> > On Mon, 2008-11-10 at 16:42 +0000, Wang lu wrote:
>> >> I have already 512(max number) IO thread running. Some of them are of "
Dead"
>> >> status. Is it safe to draw conclusion that the OSS is oversubscribed? 
>> > 
>> > Until you do some analysis of your storage with the iokit, one cannot
>> > really draw any conclusions, however if you are already at the maximum
>> > value of OST threads, it would not be difficult to believe that perhaps
>> > this is a possibility.
>> > 
>> > Try a simple experiment and half the number to 256 and see if you have
>> > any drop off in throughput to the storage devices.  If not, then you can
>> > easily assume that 512 was either too much or not necessary.  You can
>> > try doing this again if you wish.  If you get to a value of OST threads
>> > where your throughput is lower than it should be, you've gone too low.
>> > 
>> > But really, the iokit is the more efficient and accurate way to
>> > determine this.
>> > 
>> > b.
>> > 
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 



More information about the lustre-discuss mailing list