[lustre-discuss] [EXTERNAL] Re: Re: DF bug with lustre 2.12.4

Konzem, Kevin P kkonzem at contractor.usgs.gov
Fri Feb 28 12:38:57 PST 2020


Thanks for the info. Ive downloaded the patch, but am I supposed to run this on source code and compile, or what? I don't suppose you could give me a quick rundown on how to apply the patch correctly?
Thanks,
Kevin
________________________________
From: Spitz, Cory James <cory.spitz at hpe.com>
Sent: Thursday, February 27, 2020 4:58 PM
To: Konzem, Kevin P <kkonzem at contractor.usgs.gov>; Nathan Dauchy - NOAA Affiliate <nathan.dauchy at noaa.gov>
Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: [EXTERNAL] Re: [lustre-discuss] Re: DF bug with lustre 2.12.4


Hello, Kevin.



I see from LU-13285 that Nathan D. pointed you at LU-13296.  I left a comment in the ticket as well.  I think that you can try the patch from LU-13296 with your reproducer.



-Cory



On 2/21/20, 10:08 AM, "lustre-discuss on behalf of Konzem, Kevin P" <lustre-discuss-bounces at lists.lustre.org<mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of kkonzem at contractor.usgs.gov<mailto:kkonzem at contractor.usgs.gov>> wrote:



Nathan, Ive created a Jira issue for this, LU-13285<https://jira.whamcloud.com/browse/LU-13285>. In it I attached the output of an strace where I was able to capture a string of both successful and failed df's.

________________________________

From: Nathan Dauchy - NOAA Affiliate <nathan.dauchy at noaa.gov>
Sent: Thursday, February 20, 2020 2:35 PM
To: Konzem, Kevin P <kkonzem at contractor.usgs.gov>
Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: [EXTERNAL] Re: [lustre-discuss] DF bug with lustre 2.12.4



On Thu, Feb 20, 2020 at 11:47 AM Konzem, Kevin P <kkonzem at contractor.usgs.gov<mailto:kkonzem at contractor.usgs.gov>> wrote:

test this by running 'while [ true ];do /bin/df -TP /performance;done' on two sessions on the same client. As soon as I start the second while loop, the outputs go from:

Filesystem                 Type   1024-blocks   Used Available Capacity Mounted on
192.168.0.181 at tcp:/perform lustre    71467728 100416  67664944       1% /performance



to:

Filesystem                 Type   1024-blocks  Used Available Capacity Mounted on
192.168.0.181 at tcp:/perform lustre           0    -0        -0      50% /performance



Kevin,



I can confirm seeing this issue intermittently as well, and usually with a re-run of df the results are once again reasonable.  It looks like you have a more reliable reproducer though, which is good!  A support ticket was opened with our vendor, and they said if we can capture a "strace" of it for a bad run that might be helpful... but I haven't caught it in the act yet.  With your reproducer, can you get that and open a Jira ticket to track the problem?



As a workaround, try "lfs df" instead, it may take a different code path that avoids the bug.



-Nathan


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200228/771292a4/attachment-0001.html>


More information about the lustre-discuss mailing list