[Lustre-discuss] unzip hangs

Brian Behlendorf behlendorf1 at llnl.gov
Wed Nov 14 08:58:06 PST 2007


Hans, on the surface this sounds a lot like the following bug we have SUN 
looking in too.  If you have a good 1.6.3 reproducer could you please attach 
it to the bug.  We've been chasing something like this for a while and it has 
been tricky to reproduce.  I'll certainly give your test case a spin and look 
in to this.

https://bugzilla.lustre.org/show_bug.cgi?id=11332

Thanks,
Brian

> Hi,
>
> I have made some tests with Lustre 1.6.3 (Kernel
> 2.6.18-8.1.14.el5_lustre.1.6.3smp) and came across the
> following problem: an unzip of a large zip archive on a
> lustre filessystem hangs (virtually forever) after about 30000 files
> have been extracted.
> strace shows that the chmod call on the client does not return.
> The problem is reproducible.
>
> The messages file on the client says (several times):
> Nov 14 16:54:19 linuxwcc07 kernel: LustreError:
> 11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at
> 1195055558, 100s ago)  req at ffff810201c61a00 x491921/t0
> o36->lustre-MDT0000_UUID at 137.226.71.155@tcp:12 lens 5864/296 ref 1 fl
> Rpc:/0/0 rc 0/-22
> Nov 14 16:54:19 linuxwcc07 kernel: LustreError:
> 11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at
> 1195055558, 100s ago)  req at ffff810201c61a00 x491921/t0
> o36->lustre-MDT0000_UUID at 137.226.71.155@tcp:12 lens 5864/296 ref 1 fl
> Rpc:/0/0 rc 0/-22
> Nov 14 16:54:19 linuxwcc07 kernel: Lustre:
> lustre-MDT0000-mdc-ffff81021adedc00: Connection to service
> lustre-MDT0000 via nid 137.226.71.155 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
> Nov 14 16:54:19 linuxwcc07 kernel: Lustre:
> lustre-MDT0000-mdc-ffff81021adedc00: Connection to service
> lustre-MDT0000 via nid 137.226.71.155 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
> Nov 14 16:54:19 linuxwcc07 kernel: Lustre:
> lustre-MDT0000-mdc-ffff81021adedc00: Connection restored to service
> lustre-MDT0000 using nid 137.226.71.155 at tcp.
>
> The corresponding messages on the MDS:
> Nov 14 16:52:38 linuxwcc05 kernel: LustreError:
> 7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from
> 12345-137.226.71.157 at tcp, match 491921 length 5864 too big: 7416 left,
> 5120 allowed
> Nov 14 16:52:38 linuxwcc05 kernel: LustreError:
> 7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from
> 12345-137.226.71.157 at tcp, match 491921 length 5864 too big: 7416 left,
> 5120 allowed
> Nov 14 16:54:19 linuxwcc05 kernel: Lustre:
> 7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000:
> ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting
> Nov 14 16:54:19 linuxwcc05 kernel: Lustre:
> 7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000:
> ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting
>
> Is this a known issue?
>
> Regards,
> Hans Schnitzer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071114/4cafd456/attachment.pgp>


More information about the lustre-discuss mailing list