[lustre-discuss] Problem on some client that don't want to remount filesystem ( server on 2.5.3 )

Philippe Weill Philippe.Weill at latmos.ipsl.fr
Fri Jun 12 03:15:20 PDT 2015


hello

we add a problem on our lustre 2.5.3 infrastructure
small cluster 24 nodes

mds rebooted alone and
now some client refused to mount again the filesystem

client are in 1.8.9wc1 since we're in migration phase

all clients could mount the 1.8 version
6 client can't mount again the 2.5.3 filesystem

Jun 12 09:52:52 ciclad11 kernel: Lustre: Server MGS version (2.5.3.0) is much newer than client version (1.8.9)
Jun 12 09:52:52 ciclad11 kernel: Lustre: MGC172.20.3.74 at o2ib: Reactivating import
Jun 12 09:52:52 ciclad11 kernel: Lustre: MGC172.20.3.74 at o2ib: Connection restored to service MGS using nid 172.20.3.74 at o2ib.
Jun 12 09:52:52 ciclad11 kernel: Lustre: client etherfs-client(ffff88040ef2f800) umount complete
Jun 12 09:52:52 ciclad11 kernel: LustreError: 4754:0:(obd_mount.c:2067:lustre_fill_super()) Unable to mount  (-4)

log from mds

Jun 12 09:52:52 mds2-ipsl kernel: Lustre: MGS: Client e26b1313-d901-a410-7c8b-6c6148b6bd92 (at 172.20.3.243 at o2ib) reconnecting
Jun 12 09:53:45 mds2-ipsl kernel: Lustre: MGS: Client a3cd5035-35d2-4f23-e337-73d0e7192047 (at 172.20.3.243 at o2ib) reconnecting
Jun 12 09:54:38 mds2-ipsl kernel: Lustre: MGS: Client b7896a48-b23d-1651-4b3f-fa5c90cceab7 (at 172.20.3.243 at o2ib) reconnecting
Jun 12 09:55:41 mds2-ipsl kernel: Lustre: MGS: haven't heard from client b25d7b77-22b5-c391-b883-7ae8f2044d09 (at 172.20.3.243 at o2ib) 
in 228 seconds. I think it's dead, and I am evicting it. exp ffff8811c86af400, cur 1434095741 expire 1434095591 last 1434095513


I try to change the client version on not working client

Jun 12 12:01:35 ciclad19 kernel: Lustre: 13789:0:(client.c:1918:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow 
reply: [sent 1434103288/real 1434103288]  req at ffff88301856b800 x1503749299241316/t0(0) 
o503->MGC172.20.3.74 at o2ib@172.20.3.74 at o2ib:26/25 lens 272/8416 e 0 to 1 dl 1434103295 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Jun 12 12:01:35 ciclad19 kernel: LustreError: 166-1: MGC172.20.3.74 at o2ib: Connection to MGS (at 172.20.3.74 at o2ib) was lost; in 
progress operations using this service will fail
Jun 12 12:01:41 ciclad19 kernel: Lustre: 3851:0:(client.c:1918:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow 
reply: [sent 1434103295/real 1434103295]  req at ffff88381ba7fc00 x1503749299241320/t0(0) 
o250->MGC172.20.3.74 at o2ib@172.20.3.74 at o2ib:26/25 lens 400/544 e 0 to 1 dl 1434103301 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Jun 12 12:01:48 ciclad19 kernel: LustreError: 15c-8: MGC172.20.3.74 at o2ib: The configuration from log 'etherfs-client' failed (-5). 
This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog 
for more information.
Jun 12 12:01:48 ciclad19 kernel: LustreError: 13789:0:(llite_lib.c:1046:ll_fill_super()) Unable to process log: -5
Jun 12 12:01:48 ciclad19 kernel: Lustre: Unmounted etherfs-client
Jun 12 12:01:48 ciclad19 kernel: LustreError: 13789:0:(obd_mount.c:1325:lustre_fill_super()) Unable to mount  (-5)

from mds
Jun 12 12:01:35 mds2-ipsl kernel: Lustre: MGS: Client 5c623fa9-1cae-6b75-5e15-acb8add53042 (at 172.20.3.235 at o2ib) reconnecting
Jun 12 12:05:25 mds2-ipsl kernel: Lustre: MGS: haven't heard from client 5c623fa9-1cae-6b75-5e15-acb8add53042 (at 172.20.3.235 at o2ib) 
in 230 seconds. I think it's dead, and I am evicting it. exp ffff88203ddda000, cur 1434103525 expire 1434103375 last 1434103295

any idea


-- 
Weill Philippe -  Administrateur Systeme et Reseaux
CNRS/UPMC/IPSL   LATMOS (UMR 8190)


More information about the lustre-discuss mailing list