<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
The changelog_mask has a default value. If you do</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
changelog_mask='MARK MTIME CTIME'</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
you are setting the mask to this exact value, whereas</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
changelog_mask='+SATTR'</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
is keeping all the default flags plus adding SATTR. Thus the difference in output.</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
AFAIK, both commands should work, so it feels like a bug. Looks like that some missing flag in the first case in causing some bugs, whereas in your second case almost all flags are enabled. You can try to bisect them to find out a smaller flag set that still
work and report that in jira.whamcloud.com.</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
Aurélien</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 11pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>De :</b> Philippe Dos Santos <philippe.dos-santos@ipsl.fr><br>
<b>Envoyé :</b> jeudi 21 novembre 2024 11:18<br>
<b>À :</b> Aurelien Degremont <adegremont@nvidia.com><br>
<b>Cc :</b> lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org>; Philippe Weill <Philippe.Weill@latmos.ipsl.fr><br>
<b>Objet :</b> Re: [lustre-discuss] Report Strange Problem on 2.15.5 with changelog_mask</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">External email: Use caution opening links or attachments<br>
<br>
<br>
Hello Aurelien,<br>
<br>
I'm working with Philippe WEILL and I'm Philippe too ;o)<br>
<br>
We first met the problem a few months ago.<br>
And it happened again yesterday after the maintenance window.<br>
On production we now have all servers and clients running Lustre 2.15.5.<br>
<br>
We reproduced the problem with 3 RockyLinux 8.10 VMs running Lustre 2.15.5 (1x mds-mgs, 2x oss and 1x client).<br>
We wonder if it's be related to a misuse of the changelog mask (='MARK MTIME CTIME' vs ='+MTIME +CTIME') ?<br>
<br>
## Making the problem happen :<br>
<br>
[root@test-mds-mgs ~]# lctl set_param -P mdd.lustre-MDT0000.changelog_mask='MARK MTIME CTIME'<br>
[root@test-mds-mgs ~]# reboot<br>
[root@test-mds-mgs ~]# mount -t lustre /dev/sdb /mnt/mgt/<br>
[root@test-mds-mgs ~]# mount -t lustre /dev/sdc /mnt/mdt/<br>
[root@test-mds-mgs ~]# lctl get_param mdd.lustre-MDT0000.changelog_mask<br>
mdd.lustre-MDT0000.changelog_mask=MARK MTIME CTIME<br>
<br>
[root@test-rbh-cl-215 lustre]# LANG=C touch aeffacer<br>
touch: setting times of 'aeffacer': Input/output error<br>
<br>
[root@test-mds-mgs ~]# LANG=C dmesg -T<br>
...<br>
[Thu Nov 21 10:54:24 2024] Lustre: Lustre: Build Version: 2.15.5<br>
[Thu Nov 21 10:54:24 2024] LNet: Added LNI 172.20.240.172@tcp [8/256/0/180]<br>
[Thu Nov 21 10:54:24 2024] LNet: Accept secure, port 988<br>
[Thu Nov 21 10:54:24 2024] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc<br>
[Thu Nov 21 10:54:35 2024] LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc<br>
[Thu Nov 21 10:54:35 2024] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 172.20.240.171@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.<br>
[Thu Nov 21 10:54:35 2024] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 300-900<br>
[Thu Nov 21 10:54:35 2024] Lustre: lustre-MDD0000: changelog on<br>
[Thu Nov 21 10:55:26 2024] Lustre: lustre-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects<br>
[Thu Nov 21 10:55:26 2024] Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.<br>
[Thu Nov 21 10:55:26 2024] LustreError: 1907:0:(llog_cat.c:543:llog_cat_current_log()) lustre-MDD0000: next log does not exist!<br>
...<br>
<br>
## "Solving" the problem:<br>
<br>
[root@test-mds-mgs ~]# lctl set_param -P mdd.lustre-MDT0000.changelog_mask='+SATTR'<br>
[root@test-mds-mgs ~]# reboot<br>
[root@test-mds-mgs ~]# mount -t lustre /dev/sdb /mnt/mgt/<br>
[root@test-mds-mgs ~]# mount -t lustre /dev/sdc /mnt/mdt/<br>
[root@test-mds-mgs ~]# lctl get_param mdd.lustre-MDT0000.changelog_mask<br>
mdd.lustre-MDT0000.changelog_mask=<br>
MARK CREAT MKDIR HLINK SLINK MKNOD UNLNK RMDIR RENME RNMTO CLOSE LYOUT TRUNC SATTR XATTR HSM MTIME CTIME MIGRT FLRW RESYNC<br>
<br>
[root@test-rbh-cl-215 lustre]# touch aeffacer<br>
[root@test-rbh-cl-215 lustre]# ll aeffacer<br>
-rw-r--r-- 1 root root 0 21 nov. 11:03 aeffacer<br>
<br>
[root@test-mds-mgs ~]# LANG=C dmesg -T<br>
...<br>
[Thu Nov 21 11:02:52 2024] Lustre: Lustre: Build Version: 2.15.5<br>
[Thu Nov 21 11:02:52 2024] LNet: Added LNI 172.20.240.172@tcp [8/256/0/180]<br>
[Thu Nov 21 11:02:52 2024] LNet: Accept secure, port 988<br>
[Thu Nov 21 11:02:53 2024] LDISKFS-fs (sdb): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc<br>
[Thu Nov 21 11:02:57 2024] LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc<br>
[Thu Nov 21 11:02:57 2024] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 300-900<br>
[Thu Nov 21 11:02:57 2024] Lustre: lustre-MDD0000: changelog on<br>
[Thu Nov 21 11:03:27 2024] Lustre: lustre-MDT0000: Will be in recovery for at least 5:00, or until 1 client reconnects<br>
[Thu Nov 21 11:03:27 2024] Lustre: lustre-MDT0000: Recovery over after 0:01, of 1 clients 1 recovered and 0 were evicted.<br>
<br>
Philippe<br>
<br>
<br>
----- Mail original -----<br>
De: "Philippe Weill" <Philippe.Weill@latmos.ipsl.fr><br>
À: "Aurelien Degremont" <adegremont@nvidia.com>, lustre-discuss@lists.lustre.org<br>
Envoyé: Mercredi 20 Novembre 2024 17:44:16<br>
Objet: Re: [lustre-discuss] Report Strange Problem on 2.15.5 with changelog_mask<br>
<br>
On 20/11/2024 16:24, Aurelien Degremont wrote:<br>
> Hello Philippe,<br>
><br>
> I do not see why changing the changelog mask would cause I/O error, especially as this seems transient.<br>
> Did you happen to have any errors on your client hosts or MDS hosts as the time of your testing ? (see dmesg)<br>
<br>
<br>
hello<br>
<br>
no we did not see and we have reproduced the problem with 3 vm Rocky 8.10 with fresh 2.15.5 ( 1 mds , 1 oss , 1 client )<br>
<br>
<br>
><br>
><br>
> Aurélien<br>
> ------------------------------------------------------------------------------------------------------------------------------------<br>
> *De :* lustre-discuss <lustre-discuss-bounces@lists.lustre.org> de la part de Philippe Weill <Philippe.Weill@latmos.ipsl.fr><br>
> *Envoyé :* mercredi 20 novembre 2024 07:11<br>
> *À :* lustre-discuss@lists.lustre.org <lustre-discuss@lists.lustre.org><br>
> *Objet :* [lustre-discuss] Report Strange Problem on 2.15.5 with changelog_mask<br>
> External email: Use caution opening links or attachments<br>
><br>
><br>
> Hello<br>
><br>
> after passing the following command on our lustre MDS<br>
><br>
> lctl set_param -P mdd.*-MDT0000.changelog_mask='MARK MTIME CTIME'<br>
><br>
> unmounting and remounting the mdt on mds<br>
><br>
> we had error on touch chmod chgrp existing files<br>
><br>
> root@host:~# echo foobar > /scratch/root/foobar<br>
> root@host:~# cat /scratch/root/foobar<br>
> foobar<br>
> root@host:~# echo foobar2 >> /scratch/root/foobar<br>
> root@host:~# cat /scratch/root/foobar<br>
> foobar<br>
> foobar2<br>
> root@host:~# touch /scratch/root/foobar<br>
> touch: setting times of '/scratch/root/foobar': Input/output error<br>
> root@host:~# chgrp group /scratch/root/foobar<br>
> chgrp: changing group of '/scratch/root/foobar': Input/output error<br>
> root@host:~# chmod 666 /scratch/root/foobar<br>
> chmod: changing permissions of '/scratch/root/foobar': Input/output error<br>
><br>
><br>
> doing the following command<br>
><br>
> lctl set_param -P mdd.*-MDT0000.changelog_mask='-MARK -MTIME -CTIME'<br>
><br>
><br>
> and only activating non permanently for our robinhood<br>
><br>
> lctl set_param mdd.*-MDT0000.changelog_mask='MARK MTIME CTIME'<br>
><br>
><br>
> [root@mds ~]# lctl get_param mdd.scratch-MDT0000.changelog_mask<br>
> mdd.scratch-MDT0000.changelog_mask=MARK MTIME CTIME<br>
><br>
><br>
> everything started to work again<br>
><br>
> Bug or bad use from us ?<br>
> _______________________________________________<br>
> lustre-discuss mailing list<br>
> lustre-discuss@lists.lustre.org<br>
> <a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643765311%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=nyYWcMtejqwdkaO%2BoF2AXvi0wjQLfjX7ihGl11Ol44Y%3D&reserved=0</a>
<<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643782865%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=%2Fs5ibjjz682sH9IFUpgnqMftgL%2FvujT37bebw8w8g6k%3D&reserved=0</a>><br>
<br>
--<br>
Weill Philippe - Administrateur Systeme et Reseaux<br>
CNRS/UPMC/IPSL LATMOS (UMR 8190)<br>
Tour 45/46 3e Etage B302|4 Place Jussieu|75252 Paris Cedex 05 - FRANCE<br>
Email:philippe.weill@latmos.ipsl.fr | tel:+33 0144274759<br>
_______________________________________________<br>
lustre-discuss mailing list<br>
lustre-discuss@lists.lustre.org<br>
<a href="http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org">https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=05%7C02%7Cadegremont%40nvidia.com%7C9ebc0b0435e24da9446b08dd0a15ed9a%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C638677811643794634%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C40000%7C%7C%7C&sdata=oe9Fq73BgDT6kg9tGstln7Ys%2FpSku3%2B%2B9SwLdBHS0QE%3D&reserved=0</a><br>
</div>
</span></font></div>
</body>
</html>