<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt">Here the failover is designed in such a way that the IP address moves (fails over) with OST and becomes active on the other server.<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">This is probably the source of your problem. I would suggest assigning unique IP addresses to each OSS.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Chris Horn<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="color:black">From:
</span></b><span style="color:black">lustre-discuss <lustre-discuss-bounces@lists.lustre.org> on behalf of Backer <backer.kolo@gmail.com><br>
<b>Date: </b>Tuesday, November 5, 2024 at 10:19</span><span style="font-family:"Arial",sans-serif;color:black"> </span><span style="color:black">PM<br>
<b>To: </b>Backer via lustre-discuss <lustre-discuss@lists.lustre.org>, lustre-devel@lists.lustre.org <lustre-devel@lists.lustre.org><br>
<b>Subject: </b>Re: [lustre-discuss] Lustre switching to loop back lnet interface when it is not desired<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal">Any ideas on how to avoid using 0@lo as failover_nids? Please see below. <o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On Tue, 5 Nov 2024 at 12:34, Backer <<a href="mailto:backer.kolo@gmail.com">backer.kolo@gmail.com</a>> wrote:<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<div>
<p class="MsoNormal">Hi,<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal">Mounting the Lustre file file system on the OSS. Some of the OSTs are locally attached to the OSS.
<br>
<br>
The failover IP on the OST is "10.99.100.152". It is a local lnet on the OSS. However, when the client mounts it, the import automatically changes to 0@lo. It is undesirable here because when this OST fails over to another server, the client is still trying
to connect to 0@lo while it is no longer on the same host. This makes the client fs mount hangs for ever. <br>
<br>
Here the failover is designed in such a way that the IP address moves (fails over) with OST and becomes active on the other server.
<br>
<br>
How can I make the import pointing to the real IP and not the loopback? (so that the failover works)<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><span style="font-family:"Courier New"">[oss000 ~]$ lfs df<br>
UUID 1K-blocks Used Available Use% Mounted on<br>
fs-MDT0000_UUID 29068444 25692 26422344 1% /mnt/fs[MDT:0]<br>
fs-OST0000_UUID 50541812 30160292 17743696 63% /mnt/fs[OST:0]<br>
fs-OST0001_UUID 50541812 29301740 18602248 62% /mnt/fs[OST:1]<br>
fs-OST0002_UUID 50541812 29356508 18547480 62% /mnt/fs[OST:2]<br>
fs-OST0003_UUID 50541812 8822980 39081008 19% /mnt/fs[OST:3]<br>
<br>
filesystem_summary: 202167248 97641520 93974432 51% /mnt/fs<br>
<br>
[oss000 ~]$ df -h<br>
Filesystem Size Used Avail Use% Mounted on<br>
devtmpfs 30G 0 30G 0% /dev<br>
tmpfs 30G 8.1M 30G 1% /dev/shm<br>
tmpfs 30G 25M 30G 1% /run<br>
tmpfs 30G 0 30G 0% /sys/fs/cgroup<br>
/dev/mapper/ocivolume-root 36G 17G 19G 48% /<br>
/dev/sdc2 1014M 637M 378M 63% /boot<br>
/dev/mapper/ocivolume-oled 10G 2.5G 7.6G 25% /var/oled<br>
/dev/sdc1 100M 5.1M 95M 6% /boot/efi<br>
tmpfs 5.9G 0 5.9G 0% /run/user/987<br>
tmpfs 5.9G 0 5.9G 0% /run/user/0<br>
/dev/sdb 49G 28G 18G 62% /fs-OST0001<br>
/dev/sda 49G 29G 17G 63% /fs-OST0000<br>
tmpfs 5.9G 0 5.9G 0% /run/user/1000<br>
10.99.100.221@tcp1:/fs 193G 94G 90G 51% /mnt/fs<br>
<br>
[oss000 ~]$ sudo tunefs.lustre --dryrun /dev/sda<br>
checking for existing Lustre data: found<br>
<br>
Read previous values:<br>
Target: fs-OST0000<br>
Index: 0<br>
Lustre FS: fs<br>
Mount type: ldiskfs<br>
Flags: 0x1002<br>
(OST no_primnode )<br>
Persistent mount opts: ,errors=remount-ro<br>
Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1<br>
<br>
<br>
Permanent disk data:<br>
Target: fs-OST0000<br>
Index: 0<br>
Lustre FS: fs<br>
Mount type: ldiskfs<br>
Flags: 0x1002<br>
(OST no_primnode )<br>
Persistent mount opts: ,errors=remount-ro<br>
Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1<br>
<br>
exiting before disk write.<br>
<br>
<br>
[oss000 proc]# cat /proc/fs/lustre/osc/fs-OST0000-osc-ffff89c57672e000/import<br>
import:<br>
name: fs-OST0000-osc-ffff89c57672e000<br>
target: fs-OST0000_UUID<br>
state: IDLE<br>
connect_flags: [ write_grant, server_lock, version, request_portal, max_byte_per_rpc, early_lock_cancel, adaptive_timeouts, lru_resize, alt_checksum_algorithm, fid_is_enabled, version_recovery, grant_shrink, full20, layout_lock, 64bithash, object_max_bytes,
jobstats, einprogress, grant_param, lvb_type, short_io, lfsck, bulk_mbits, second_flags, lockaheadv2, increasing_xid, client_encryption, lseek, reply_mbits ]<br>
connect_data:<br>
flags: 0xa0425af2e3440078<br>
instance: 39<br>
target_version: 2.15.3.0<br>
initial_grant: 8437760<br>
max_brw_size: 4194304<br>
grant_block_size: 4096<br>
grant_inode_size: 32<br>
grant_max_extent_size: 67108864<br>
grant_extent_tax: 24576<br>
cksum_types: 0xf7<br>
max_object_bytes: 17592186040320<br>
import_flags: [ replayable, pingable, connect_tried ]<br>
connection:<br>
failover_nids: [ 0@lo, 0@lo ]<br>
current_connection: 0@lo<br>
connection_attempts: 1<br>
generation: 1<br>
in-progress_invalidations: 0<br>
idle: 36 sec<br>
rpcs:<br>
inflight: 0<br>
unregistering: 0<br>
timeouts: 0<br>
avg_waittime: 2627 usec<br>
service_estimates:<br>
services: 1 sec<br>
network: 1 sec<br>
transactions:<br>
last_replay: 0<br>
peer_committed: 0<br>
last_checked: 0</span><o:p></o:p></p>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</body>
</html>