[lustre-discuss] Hidden QoS in Lustre ?

Andreas Dilger adilger at dilger.ca
Thu Oct 8 12:32:53 PDT 2020


On Oct 8, 2020, at 10:37 AM, Tung-Han Hsieh <thhsieh at twcp1.phys.ntu.edu.tw> wrote:
> 
> Dear All,
> 
> In the past months, we encountered several times of Lustre I/O abnormally
> slowing down. It is quite mysterious that there seems no problem on the
> network hardware, nor the lustre itself since there is no error message
> at all in MDT/OST/client sides.
> 
> Recently we probably found a way to reproduce it, and then have some
> suspections. We found that if we continuously perform I/O on a client
> without stop, then after some time threshold (probably more than 24
> hours), the additional file I/O bandwidth of that client will be shriked
> dramatically.
> 
> Our configuration is the following:
> - One MDT and one OST server, based on ZFS + Lustre-2.12.4.
> - The OST is served by a RAID 5 system with 15 SAS hard disks.
> - Some clients connect to MDT/OST through Infiniband, some through
>  gigabit ethernet.
> 
> Our test was focused on the clients using infiniband, which is described
> in the following:
> 
> We have a huge (several TB) amount of data stored in the Lustre file
> system to be transferred to outside network. In order not to exhaust
> the network bandwidth of our institute, we transfer the data with limited
> bandwidth via the following command:
> 
> rsync -av --bwlimit=1000 <data_in_Lustre> <out_side_server>:/<out_side_path>/
> 
> That is, the transferring rate is 1 MB per second, which is relatively
> low. The client read the data from Lustre through infiniband. So during
> data transmission, presumably there is no problem to do other data I/O
> on the same client. On average, when copy a 600 MB file from one directory
> to another directory (both in the same Lustre file system), it took about
> 1.0 - 2.0 secs, even when the rsync process still working.
> 
> But after about 24 hours of continuously sending data via rsync, the
> additional I/O on the same client was dramatically shrinked. When it happens,
> it took more than 1 minute to copy a 600 MB from somewhere to another place
> (both in the same Lustre) while rsync is still running.
> 
> Then, we stopped the rsync process, and wait for a while (about one
> hour). The I/O performance of copying that 600 MB file returns normal.
> 
> Based on this observation, we are suspecting that whether there is a
> hidden QoS mechanism built in Lustre ? When a process occupies the I/O
> bandwidth for a long time and exceeded some limits, does Lustre automatically
> shrinked the I/O bandwidth for all processes running in the same client ?
> 
> I am not against such QoS design, if it does exist. But the amount of
> shrinking seems to be too large for infiniband (QDR and above). Then
> I am further suspecting that whether this is due to that our system is
> mixed with clients in which some have infiniband but some do not ?
> 
> Could anyone help to fix this problem ? Any suggestions will be very
> appreciated.

There is no "hidden QOS", unless it is so well hidden that I don't know
about it.

You could investigate several different things to isolate the problem:
- try with a 2.13.56 client to see if the problem is already fixed
- check if the client is using a lot of CPU when it becomes slow
- run strace on your copy process to see which syscalls are slow
- check memory/slab usage
- enable Lustre debug=-1 and dump the kernel debug log to see where
  the process is taking a long time to complete a request

It is definitely possible that there is some kind of problem, since this
is not a very common workload to be continuously writing to the same file
descriptor for over a day.  You'll have to do the investigation on your
system to isolate the source of the problem.

Cheers, Andreas





-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 873 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20201008/7a402f38/attachment.sig>


More information about the lustre-discuss mailing list