[Lustre-discuss] Limits on OSTs per OSS?
Andreas Dilger
adilger at sun.com
Wed Aug 19 19:20:19 PDT 2009
On Aug 19, 2009 10:28 -0400, Ms. Megan Larko wrote:
> We have one OSS running Lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
> This OSS has 16Gb RAM for 76Tb of formatted Lustre disk space.
>
> Client sees: ic-mds1 at o2ib:/crew8 Total Usable Space 76Tb
>
> The OSS has 6 JBODS, each of which is partitioned in two parts to stay
> below the Lustre 8Tb per partition limit.
> /dev/sdb1 6.3T 3.8T 2.3T 63% /srv/lustre/OST/crew8-OST0000
> /dev/sdb2 6.3T 3.7T 2.3T 62% /srv/lustre/OST/crew8-OST0001
> /dev/sdc1 6.3T 3.8T 2.3T 63% /srv/lustre/OST/crew8-OST0002
> /dev/sdc2 6.3T 3.8T 2.2T 64% /srv/lustre/OST/crew8-OST0003
> /dev/sdd1 6.3T 3.8T 2.2T 64% /srv/lustre/OST/crew8-OST0004
> /dev/sdd2 6.3T 4.2T 1.8T 70% /srv/lustre/OST/crew8-OST0005
> /dev/sdi1 6.3T 4.3T 1.8T 71% /srv/lustre/OST/crew8-OST0006
> /dev/sdi2 6.3T 3.8T 2.2T 64% /srv/lustre/OST/crew8-OST0007
> /dev/sdj1 6.3T 3.8T 2.3T 63% /srv/lustre/OST/crew8-OST0008
> /dev/sdj2 6.3T 3.8T 2.2T 63% /srv/lustre/OST/crew8-OST0009
> /dev/sdk1 6.3T 3.7T 2.3T 62% /srv/lustre/OST/crew8-OST0010
> /dev/sdk2 6.3T 3.7T 2.3T 63% /srv/lustre/OST/crew8-OST0011
>
> As you can see, this is no where near the recommendation of 1Gb of RAM
> per OST. Yes, we do occasionally, under load, see kernel panics due
> to, we believe, insufficient memory and swap. These panics occur
> approximately once per month. We also see watchdog messages stating
> "swap page allocation failure" messages sometimes a day prior to
> kernel panic. After this Lustre disk was up and running was I then
> enlightened that this was too much load for a single OSS. Ah well,
> live and learn. I am planning to split this one large group across
> two OSSes in the next month. Hopefully the kernel panics and
> watchdog errors will go away with the disk OST load shared across two
> OSS machines.
If you have a need for large capacity, but not necessarily peak throughput,
you could shrink the journals on these filesystems (which themselves
consume about 4.5GB of RAM). It is likely you can't utilize the full
bandwidth of these disks anyways, unless you have a lot of network
bandwidth into this node.
umount /dev/sdX
e2fsck /dev/sdX
tune2fs -O ^has_journal /dev/sdX
tune2fs -j -J size=128 /dev/sdX
mount /dev/sdX
or, when creating the filesystem:
mkfs.lustre --mountfsoptions="-J size=128"
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list