[Lustre-discuss] Limits on OSTs per OSS?

Wed Aug 19 07:44:35 PDT 2009

Greetings Sebastien,

On Wed, Aug 19, 2009 at 10:39 AM, Sébastien
Buisson<sebastien.buisson at bull.net> wrote:
> Hi,
>
> To me:
> 12 OSTs x 1.2 GB = 14.4 GB < 16GB
>
> So you are clearly in the recommendation.

I thought I would be with in the spec *if* my OSTs were smaller units.
  As they are JBODs in sections of 6+ Tb each, I though I was
"coloring outside the lines".

Thanks,
megan

>
> Cheers,
> Sebastien.
>
>
> Ms. Megan Larko a écrit :
>>
>> Responding to what Sebastien has written:
>>>
>>> Hi,
>>
>>> Just a small feedback from our own experience.
>>> I agree with Brian about the fact that there is no strong limit on the
>>> number of OSTs per OSS in the Lustre code. But one should really take
>>> into account the available memory on OSSes when defining the number of
>>> OSTs per OSS (and so the size of each OST). If you do not have 1GB or
>>> 1.2 GB of memory per OST on your OSSes, you will run into serious
>>
>> t>rouble with "out of memory" messages.
>>
>>> For instance, if you want 8 OSTs per OSS, your OSSes should have at
>>> least 10GB of RAM.
>>
>>> Unfortunately we experienced those "out of memory" problems, so I advise
>>> you to read Lustre Operations Manual chapter 33.12 "OSS RAM Size for a
>>> Single OST".
>>
>>> Cheers,
>>> Sebastien.
>>
>> We have one OSS running Lustre  2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
>>  This OSS has 16Gb  RAM for 76Tb of formatted Lustre disk space.
>>
>> [root at oss4 ~]# cat /proc/meminfo
>> MemTotal:     16439360 kB
>> MemFree:         88204 kB
>>
>> Client sees: ic-mds1 at o2ib:/crew8   Total Usable Space 76Tb
>>
>> The OSS has 6 JBODS, each of which is partitioned in two parts to stay
>> below the Lustre 8Tb per partition limit.
>> /dev/sdb1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0000
>> /dev/sdb2             6.3T  3.7T  2.3T  62% /srv/lustre/OST/crew8-OST0001
>> /dev/sdc1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0002
>> /dev/sdc2             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0003
>> /dev/sdd1             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0004
>> /dev/sdd2             6.3T  4.2T  1.8T  70% /srv/lustre/OST/crew8-OST0005
>> /dev/sdi1             6.3T  4.3T  1.8T  71% /srv/lustre/OST/crew8-OST0006
>> /dev/sdi2             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0007
>> /dev/sdj1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0008
>> /dev/sdj2             6.3T  3.8T  2.2T  63% /srv/lustre/OST/crew8-OST0009
>> /dev/sdk1             6.3T  3.7T  2.3T  62% /srv/lustre/OST/crew8-OST0010
>> /dev/sdk2             6.3T  3.7T  2.3T  63% /srv/lustre/OST/crew8-OST0011
>>
>> As you can see, this is no where near the recommendation of 1Gb of RAM
>> per OST.  Yes, we do occasionally, under load, see kernel panics due
>> to, we believe, insufficient memory and swap.   These panics occur
>> approximately once per month.   We also see watchdog messages stating
>> "swap page allocation failure" messages sometimes a day prior to
>> kernel panic.  After this Lustre disk was up and running was I then
>> enlightened that this was too much load for a single OSS.   Ah well,
>> live and learn.   I am planning to split this one large group across
>> two OSSes in the next month.   Hopefully the kernel panics and
>> watchdog errors will go away with the disk OST load shared across two
>> OSS machines.
>>
>> Just one real life scenario for your consideration.
>>
>> megan
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>