[Lustre-discuss] Lustre client lockups
Andreas Dilger
adilger at sun.com
Thu Nov 6 09:23:20 PST 2008
On Nov 04, 2008 09:06 -0800, Kurt Dillen wrote:
> We have a serious problem with lustre. Since a few days we have
> lockups on the client side. Not all clients are having this
> problem.
>
> We are running this kernel 2.6.16-54-0.2.5_lustre.1.6.4.3smp.
>
> The statahead disable is done on the systems.
>
> Some more information about the environment:
>
> - Lustre clients are all vmware virtual systems
> - Lustre Farm are all vmware virtual systems
>
> the errors I see are the following:
>
> LustreError: 3420:0:(events.c:134:client_bulk_callback()) event type
> 0, status -5, desc ffff8100e5dca000
> LustreError: 3428:0:(client.c:975:ptlrpc_expire_one_request()) @@@
> timeout (sent at 1225816920, 100s ago) req at ffff8100e7e2ba00 x17940/t0
> o4->lustre-OST0005_UUID at 172.16.0.29@tcp:28 lens 384/352 ref 2 fl Rpc:/
> 0/0 rc 0/-22
> Lustre: lustre-OST0005-osc-ffff8100e8551800: Connection to service
> lustre-OST0005 via nid 172.16.0.29 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
These all look like network problems. Running production Lustre servers
inside a vmware doesn't make much sense. We don't test clients inside
vmware, but I don't think that is nearly as bad as running the servers
in a virtual environment.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list