[lustre-discuss] The confusion for mds hardware requirement

Mon Mar 11 02:34:19 PDT 2024

All of the numbers in this example are estimates/approximations to give an idea about the amount of memory that the MDS may need under normal operating circumstances.  However, the MDS will also continue to function with more or less memory.  The actual amount of memory in use will change very significantly based on application type, workload, etc. and the numbers "256" and "100,000" are purely examples of how many files might be in use.

I'm not sure you can "test" those numbers, because whatever number of files you test with will be the number of files actually in use.  You could potentially _measure_ the number of files/locks in use on a large cluster, but again this will be highly site and application dependent.

Cheers, Andreas

On Mar 11, 2024, at 01:24, Amin Brick Mover <aminbrickmover at gmail.com<mailto:aminbrickmover at gmail.com>> wrote:

Hi,  Andreas.

Thank you for your reply.

Can I consider 256 files per core as an empirical parameter? And does the parameter '256' need testing based on hardware conditions? Additionally, in the calculation formula "12 interactive clients * 100,000 files * 2KB = 2400 MB," is the number '100,000' files also an empirical parameter? Do I need to test it. Can I directly use the values '256' and '100,000'?

Andreas Dilger <adilger at whamcloud.com<mailto:adilger at whamcloud.com>> 于2024年3月11日周一 05:47写道：
These numbers are just estimates, you can use values more suitable to your workload.

Similarly, 32-core clients may be on the low side these days.  NVIDIA DGX nodes have 256 cores, though you may not have 1024 of them.

The net answer is that having 64GB+ of RAM is inexpensive these days and improves MDS performance, especially if you compare it to the cost of client nodes that would sit waiting for filesystem access if the MDS is short of RAM.  Better to have too much RAM on the MDS than too little.

Cheers, Andreas

On Mar 4, 2024, at 00:56, Amin Brick Mover via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:

In the Lustre Manual 5.5.2.1 section, the examples mentioned:
For example, for a single MDT on an MDS with 1,024 compute nodes, 12 interactive login nodes, and a
20 million file working set (of which 9 million files are cached on the clients at one time):
Operating system overhead = 4096 MB (RHEL8)
File system journal = 4096 MB
1024 * 32-core clients * 256 files/core * 2KB = 16384 MB
12 interactive clients * 100,000 files * 2KB = 2400 MB
20 million file working set * 1.5KB/file = 30720 MB
I'm curious, how were the two numbers, 256 files/core and 100,000 files, determined? Why?

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240311/17541ef8/attachment-0001.htm>