[lustre-discuss] Read performance bad, telepathy in Lustre

Thu Jan 23 21:16:22 PST 2020

Thomas,
If you are positive that the two sets of clients are not reading files on other on the OSTs, I don't think there is anything at the Lustre level that communicates between OSSes to balance traffic or anything like that.

One possibility is congestion control at the network level, possibly at the switch?

Cheers, Andreas

On Jan 23, 2020, at 08:01, Thomas Roth <t.roth at gsi.de<mailto:t.roth at gsi.de>> wrote:

Hi all,

Lustre 2.10.6, 45 OSS with 7 OSTs each on ZFS 0.7.9, 3 MDTs (ldiskfs), clients 2.10 and 2.12. Infiniband network, Mellanox FDR w half bisectional bandwidth.

A sample of ~250.000 files, stripe count 1, average size 100 MB. is read with dd, output > /dev/null.

The location of the files has been recorded, from this we have drawn up separate file lists for each OSS.

In the first run, one client reads the files on one OSS and gets a read performance X, e.g. 2 GB/s.

In the second run, this setup is simply multiplied by 10 or 40: Client 1 still reads from OSS 1, Client 2 works with the files on OSS2, client 3 with OSS 3, ...

With only 12 pairs of this kind we see 2 or 3 pairs whose performance dropsto < 500 MB/s. The other pairs keep the read rate as seen before. Once they have finished, the remaining 2 -3 pairs jump back to original performance.

When the runs are repeated, the affected OSS are not the same as before.

This should exclude effects of bad hardware: servers, disks, cables, switches.

Since this behaviour is reproducible, the effects of interactions with other jobs/users can also be excluded.

By now I am able to reproduce the behavior on a test system, same configuration, with just 2 client-OSS pairs, nobody else on there.

56 parallel dd processes on client 1, reading files on server 1: 440 MB/s
56 parallel dd processes on client 2, reading files on server 2: 1.6 GB/s

Then kill all processes on client 2. Client 1 continues, rising to 1.1 GB/s

These processes are not even visible on the MDS of this system, and from all I understand the metadata server should be the only connecting element between the two pairs?
How do they know about each other, who, what tells client-1-server-1 to keep it low while client-2 is working on server-1?

Curioser and curioser,
Thomas

--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de<http://www.gsi.de>

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200124/3a71b77d/attachment.html>