[lustre-discuss] [EXTERNAL] Re: What is the meaning of these messages?

Mohr, Rick mohrrf at ornl.gov
Fri Dec 8 08:31:12 PST 2023


It could mean that there are network issues with that one particular client.  If the client loses connectivity to an ost for some reason (even if the problem is on the client side), requests would timeout and the client would assume the target ost is unavailable.  The client would then try to reconnect to the target on the failover node, but since the target is not available on the failover node (because no failover occurred), I believe that node would log a message like what you have seen.  The fact that you see errors on multiple  servers from the same client makes me think the problem is on the client.  Maybe the network connection is flapping up and down?

In the example you gave, is oss010 the failover node for target fs-OST00b0?

--Rick


On 12/8/23, 9:39 AM, "lustre-discuss on behalf of Backer via lustre-discuss" <lustre-discuss-bounces at lists.lustre.org <mailto:lustre-discuss-bounces at lists.lustre.org> on behalf of lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>> wrote:


Hi All,


Just sending this again. 




On Tue, 5 Dec 2023 at 15:03, Backer <backer.kolo at gmail.com <mailto:backer.kolo at gmail.com> <mailto:backer.kolo at gmail.com <mailto:backer.kolo at gmail.com>>> wrote:


Hi All,


Time to time, I see the following messages on multiple OSS about a particular client IP. What does it mean? All the OSS and OSTs are online and has been online in the past. 




Dec 4 18:05:27 oss010 kernel: LustreError: 137-5: fs-OST00b0_UUID: not available for connect from <client ip>@tcp1 (no target). If you are running an HA pair check that the target is mounted on the other server.























More information about the lustre-discuss mailing list