[Lustre-discuss] NFS Stale Handling with Lustre on RHEL 4 U7 x86_64

Alex Lyashkov Alexey.Lyashkov at Sun.COM
Thu Dec 4 05:53:09 PST 2008


On Thu, 2008-12-04 at 13:18 +0530, anil kumar wrote:
> Alex,
>  
> We are working on checking the lustre scalability so that we can
> uptake it in our production infrastructure. Below are the details of
> our setup, tests conducted and the issues faced till now, 
> Setup details :
> --------------------
> 
> Hardware Used - HP DL360 
> MDT/MGS - 1 
> OST - 13 (13 HP DL360 servers used, 1 OSS = 1 OST, 700gb x 13 )
> 
> Issue1
> ---------
> Test Environment: 
> 
> Operating System - Redhat EL4 Update 7 ,x86_64
> Lustre Version - 1.6.5.1 
> Lustre Kernel -
> kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64
I think this for server?

> Lustre Client - Xen Virtual Machines with 2.6.9-78.0.0.0.1.ELxenU
> kernel( patchless ) 
2.6.9 kernel for patchless is dangerous - some problems can be fixed due
kernel internal limitation. i suggest apply vfs_intent and dcache
patches.



> 
> Test Conducted: Performed heavy read/write ops from 190 lustre
> clients. Each client tries to read & write 14000 files parallely. 
> 
> Errors noticed : Multiple cliens evicted while writting hugh number of
> files.Lustre mount is not accessible in the evicted clients. We need
> to umount and mount to make the lustre accessible in the affected
> clients. 
> 
> server side errors noticed 
> -----------------------------------------
> Nov 26 01:03:48 kernel: LustreError:
> 29774:0:(handler.c:1515:mds_handle()) operation 41 on unconnected MDS
> from 12345-[CLIENT IP HERE]@tcp

> Nov 26 01:07:46 kernel: Lustre: farmres-MDT0000: haven't heard from
> client 2379a0f4-f298-9c78-fce6-3d8db74f912b (at [CLIENT IP HERE]@tcp)
> in 227 seconds. I think it's dead, and I am evicting it.
> Nov 26 01:43:58 kernel: Lustre: MGS: haven't heard from client
> 0c239c47-e1f7-47de-0b43-19d5819081e1 (at [CLIENT IP HERE]@tcp) in 227
> seconds. I think it's dead, and I am evicting it.
both - mds and mgs is evict client - is network link is OK ?


> Nov 26 01:54:37 kernel: LustreError:
> 29766:0:(handler.c:1515:mds_handle()) operation 101 on unconnected MDS
> from 12345-[CLIENT IP HERE]@tcp
> Nov 26 02:09:49 kernel: LustreError:
> 29760:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error
> (-107) req at 000001080ba29400 x260230/t0 o101-><?>@<?>:0/0 lens 440/0 e
> 0 to 0 dl 1227665489 ref 1 fl Interpret:/0/0 rc -107/0
> Nov 27 01:06:07 kernel: LustreError:
> 30478:0:(mgs_handler.c:538:mgs_handle()) lustre_mgs: operation 101 on
> unconnected MGS
> Nov 27 02:21:39 kernel: Lustre:
> 18420:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
> 180cf598-1e43-3ea4-6cf6-0ee40e5a2d5e reconnecting
> Nov 27 02:22:16 kernel: Lustre: Request x2282604 sent from
> farmres-MDT0000 to NID [CLIENT IP HERE]@tcp 6s ago has timed out
> (limit 6s).

> Nov 27 02:22:16 kernel: LustreError: 138-a: farmres-MDT0000: A client
> on nid [CLIENT IP HERE]@tcp was evicted due to a lock blocking
> callback to [CLIENT IP HERE]@tcp timed out: rc -107


> Nov 27 08:58:46 kernel: LustreError:
> 29755:0:(upcall_cache.c:325:upcall_cache_get_entry()) acquire timeout
> exceeded for key 0
> Nov 27 08:59:11 kernel: LustreError:
> 18473:0:(upcall_cache.c:325:upcall_cache_get_entry()) acquire timeout
> exceeded for key 0
hm... as i know this bug on FS configuration. can you reset
mdt.group_upcall to 'NONE' ?


> Nov 27 13:23:25 kernel: Lustre:
> 29752:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000:
> 3d5efff1-1652-6669-94de-c93ee73a4bc7 reconnecting
> Nov 27 02:17:16 kernel: nfs_statfs: statfs error = 116
> ------------------------
> 
> client errors 
> ------------------------
> 
> cp: cannot stat
> `/master/jdk16/sample/jnlp/webpad/src/version1/JLFAbstractAction.java': Cannot send after transport endpoint shutdown
> -------------------------
> 
> Lustre supports Xen kernel 2.6.9-78.0.0.0.1.ELxenU as patchless ? 
with some limitation. i suggest use 2.6.15 and up for patchless client.
for 2.6.16 i know about one limitation - FMODE_EXEC patch is absent.

what is in clients /var/log/messages at same time ?
> 




More information about the lustre-discuss mailing list