[lustre-discuss] Hidden QoS in Lustre ?

Thu Oct 8 09:37:03 PDT 2020

Dear All,

In the past months, we encountered several times of Lustre I/O abnormally
slowing down. It is quite mysterious that there seems no problem on the
network hardware, nor the lustre itself since there is no error message
at all in MDT/OST/client sides.

Recently we probably found a way to reproduce it, and then have some
suspections. We found that if we continuously perform I/O on a client
without stop, then after some time threshold (probably more than 24
hours), the additional file I/O bandwidth of that client will be shriked
dramatically.

Our configuration is the following:
- One MDT and one OST server, based on ZFS + Lustre-2.12.4.
- The OST is served by a RAID 5 system with 15 SAS hard disks.
- Some clients connect to MDT/OST through Infiniband, some through
  gigabit ethernet.

Our test was focused on the clients using infiniband, which is described
in the following:

We have a huge (several TB) amount of data stored in the Lustre file
system to be transferred to outside network. In order not to exhaust
the network bandwidth of our institute, we transfer the data with limited
bandwidth via the following command:

rsync -av --bwlimit=1000 <data_in_Lustre> <out_side_server>:/<out_side_path>/

That is, the transferring rate is 1 MB per second, which is relatively
low. The client read the data from Lustre through infiniband. So during
data transmission, presumably there is no problem to do other data I/O
on the same client. On average, when copy a 600 MB file from one directory
to another directory (both in the same Lustre file system), it took about
1.0 - 2.0 secs, even when the rsync process still working.

But after about 24 hours of continuously sending data via rsync, the
additional I/O on the same client was dramatically shrinked. When it happens,
it took more than 1 minute to copy a 600 MB from somewhere to another place
(both in the same Lustre) while rsync is still running.

Then, we stopped the rsync process, and wait for a while (about one
hour). The I/O performance of copying that 600 MB file returns normal.

Based on this observation, we are suspecting that whether there is a
hidden QoS mechanism built in Lustre ? When a process occupies the I/O
bandwidth for a long time and exceeded some limits, does Lustre automatically
shrinked the I/O bandwidth for all processes running in the same client ?

I am not against such QoS design, if it does exist. But the amount of
shrinking seems to be too large for infiniband (QDR and above). Then
I am further suspecting that whether this is due to that our system is
mixed with clients in which some have infiniband but some do not ?

Could anyone help to fix this problem ? Any suggestions will be very
appreciated.

Thanks very much.

T.H.Hsieh