[Lustre-discuss] Help debugging a client

Michael Barnes Michael.Barnes at jlab.org
Fri Mar 11 11:17:44 PST 2011


David,

What kernel are you running on the file server?  I've heard on the list
that the stock RedHat kernels are compiled with too small of a stack
size option and that running NFS and lustre on the same node will not
behave well together.  A minimum of a 8k stack size is needed for this
configuration.

-mb

On Mar 11, 2011, at 12:37 PM, David Noriega wrote:

> We've been running Lustre happily for a few months now, but we have
> one client that can be troublesome at times and it happens to be the
> most important client. Its our "file server" client as it runs NFS and
> Samba. I'm not sure where to start. I've seen this client disconnect
> from lustre nodes, but then recover and reconnect. There are hundreds
> of messages in dmesg about a few inodes. The big problem happened a
> few weeks ago when this client was booted and never could reconnect.
> The client and the lustre nodes simply kept saying HELLO to each
> other.
> 
> Anyways as of right now this is what I see in dmesg:
> 
> nfsd: non-standard errno: -108
> LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) ldlm_cli_enqueue: -108
> LustreError: 30558:0:(mdc_locks.c:646:mdc_enqueue()) Skipped 2114
> previous similar messages
> LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) failure
> -108 inode 561619132
> LustreError: 30558:0:(file.c:3280:ll_inode_revalidate_fini()) Skipped
> 777 previous similar messages
> LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) inode
> 18382976 mdc close failed: rc = -108
> nfsd: non-standard errno: -108
> LustreError: 29282:0:(file.c:116:ll_close_inode_openhandle()) Skipped
> 17238 previous similar messages
> nfsd: non-standard errno: -108
> nfsd: non-standard errno: -108
> nfsd: non-standard errno: -108
> nfsd: non-standard errno: -108
> nfsd: non-standard errno: -108
> LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) @@@
> IMP_INVALID  req at ffff81032da81800 x1360479978792199/t0
> o35->lustre-MDT0000_UUID at 192.168.5.104@tcp:23/10 lens 408/1128 e 0 to
> 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
> LustreError: 29282:0:(client.c:858:ptlrpc_import_delay_req()) Skipped
> 19011 previous similar messages
> nfsd: non-standard errno: -108
> 
> LustreError: 11-0: an error occurred while communicating with
> 192.168.5.104 at tcp. The mds_close operation failed with -116
> LustreError: 520:0:(file.c:116:ll_close_inode_openhandle()) inode
> 12094041 mdc close failed: rc = -116
> LustreError: 30271:0:(llite_nfs.c:96:search_inode_for_lustre())
> failure -2 inode 560111661
> 
> 
> Any ideas?
> 
> -- 
> Personally, I liked the university. They gave us money and facilities,
> we didn't have to produce anything! You've never been out of college!
> You don't know what it's like out there! I've worked in the private
> sector. They expect results. -Ray Ghostbusters
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

--
+-----------------------------------------------
| Michael Barnes
|
| Thomas Jefferson National Accelerator Facility
| Scientific Computing Group
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
+-----------------------------------------------







More information about the lustre-discuss mailing list