[lustre-discuss] some clients dmesg filled up with "dirty page discard"

Degremont, Aurelien degremoa at amazon.com
Mon Aug 31 01:09:44 PDT 2020


Do you mean you've never hit client eviction or dirty page discard before?
What was your previous Lustre version?
Dirty page discard warning exists for a long time, since Lustre 2.4.

Eviction happens for lots of reason. Evictions just mean this client did not respond in 100 sec to this OSS. It could be due to network being overloaded, hardware issue on one of these hosts, client not responding due to CPU being overloaded, or indeed a bug. You should first try to understand why these eviction happened.
Verify the server and client load at that time (CPU, network, etc…). Verify the impacted files and the application accessing them. What's the application I/O pattern? Is it putting a strong pressure on these files? File name could be obtained using the FID and 'lfs fid2path MOUNT_POINT FID'.


Aurélien

De : lustre-discuss <lustre-discuss-bounces at lists.lustre.org> au nom de 肖正刚 <guru.novice at gmail.com>
Date : dimanche 30 août 2020 à 07:41
À : Andreas Dilger <adilger at whamcloud.com>
Cc : lustre-discuss <lustre-discuss at lists.lustre.org>
Objet : RE: [EXTERNAL] [lustre-discuss] some clients dmesg filled up with "dirty page discard"


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hi, Andreas,
Thanks for your reply.
Maybe this is a bug?
We never hit this before update client to 2.12.5

Andreas Dilger <adilger at whamcloud.com<mailto:adilger at whamcloud.com>> 于2020年8月29日周六 下午6:37写道:
On Aug 25, 2020, at 17:42, 肖正刚 <guru.novice at gmail.com<mailto:guru.novice at gmail.com>> wrote:

no, on oss we found only the client who reported " dirty page discard  " being evicted.
we hit this again last night, and on oss we can see logs like:
"
[Tue Aug 25 23:40:12 2020] LustreError: 14278:0:(ldlm_lockd.c:256:expired_lock_main()) ### lock callback timer expired after 100s: evicting client at 10.10.3.223 at o2ib  ns: filter-public1-OST0000_UUID lock: ffff9f1f91cba880/0x3fcc67dad1c65842 lrc: 3/0,0 mode: PR/PR res: [0xde2db83:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->270335) flags: 0x60000400020020 nid: 10.10.3.223 at o2ib remote: 0xd713b7b417045252 expref: 7081 pid: 25923 timeout: 21386699 lvb_type: 0

It isn't clear what the question is here.  The "dirty page discard" message means that unwritten data from the client was discarded because the client was evicted and the lock covering this data was revoked by the server because the client was not responsive.


Anymore , we exec lfsck on all servers,  result is

There is no need for LFSCK in this case.  The file data was not written, but a client eviction does not result in the filesystem becoming inconsistent.

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200831/a86e4001/attachment-0001.html>


More information about the lustre-discuss mailing list