[lustre-discuss] Limit to number of OSS?

Andreas Dilger adilger at whamcloud.com
Mon Oct 7 15:33:03 PDT 2019

Whether there are problems with a large number of OSS and/or MDS nodes depends on whether you are using TCP or IB networking.

With socklnd there are 3 TCP connections per client-server pair (bulk read, bulk write, and small message) so the maximum you could have would be around (65536 - 1024)/3 = 21500 (or likely fewer) clients or servers, unless you also configured LNet routers in between (which would allow more clients, but not more servers).  That isn't a limitation for most deployments, but at least one known limitation.  For IB there is no such connection limit that I'm aware of.

There are likely other factors such as memory consumption per target, but I don't think that would be the first thing to cause problems on modern systems with hundreds of GB of RAM.

Cheers, Andreas

On Oct 4, 2019, at 01:45, Degremont, Aurelien <degremoa at amazon.com<mailto:degremoa at amazon.com>> wrote:

Thanks for this info. But actually I was really looking at the number of OSS, not OSTs :)
This is really more how Lustre client nodes and MDT will cope with very large number of OSSes.

De : Andreas Dilger <adilger at whamcloud.com<mailto:adilger at whamcloud.com>>
Date : vendredi 4 octobre 2019 à 04:54
À : "Degremont, Aurelien" <degremoa at amazon.com<mailto:degremoa at amazon.com>>
Objet : Re: [lustre-discuss] Limit to number of OSS?

On Oct 3, 2019, at 07:55, Degremont, Aurelien <degremoa at amazon.com<mailto:degremoa at amazon.com>> wrote:

Hello all!

This doc from the wiki says "Lustre can support up to 2000 OSS per file system" (http://wiki.lustre.org/Lustre_Server_Requirements_Guidelines).

I'm a bit surprised by this statement. Does somebody has information about the upper limit to the number of OSSes?
Or what could be the scaling limitator for this number of OSS? Network limit? Memory consumption? Other?

That's likely a combination of a bit of confusion and a bit of safety on the part of Intel writing that document.

The Lustre Operations Manual writes:
Although a single file can only be striped over 2000 objects, Lustre file systems can have thousands of OSTs. The I/O bandwidth to access a single file is the aggregated I/O bandwidth to the objects in a file, which can be as much as a bandwidth of up to 2000 servers. On systems with more than 2000 OSTs, clients can do I/O using multiple files to utilize the full file system bandwidth.
I think PNNL once tested up to 4000 OSTs, and I think the compile-time limit is/was 8000 OSTs (maybe it was made dynamic, I don't recall offhand), but the current code could _probably_ handle up to 65000 OSTs without significant problems.  Beyond that, there is the 16-bit OST index limit in the filesystem device labels and the __u16 lov_user_md_v1->lmm_stripe_offset to specify the starting OST index for "lfs setstripe", but that could be overcome with some changes.

Given OSTs are starting to approach 1PB with large drives and declustered-parity RAID, this would get us in the range 8-65EB, which is over 2^64 bytes (16EB), so I don't think it is an immediate concern.  Let me know if you have any trouble with a 9000-OST filesystem... :-)

Cheers, Andreas
Andreas Dilger
Principal Lustre Architect

Cheers, Andreas
Andreas Dilger
Principal Lustre Architect

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20191007/5b80a19d/attachment.html>

More information about the lustre-discuss mailing list