[Lustre-discuss] unzip hangs

Hans-Juergen Schnitzer schnitzer at rz.RWTH-Aachen.DE
Wed Nov 14 08:37:06 PST 2007


Hi,

I have made some tests with Lustre 1.6.3 (Kernel 
2.6.18-8.1.14.el5_lustre.1.6.3smp) and came across the
following problem: an unzip of a large zip archive on a
lustre filessystem hangs (virtually forever) after about 30000 files
have been extracted.
strace shows that the chmod call on the client does not return.
The problem is reproducible.

The messages file on the client says (several times):
Nov 14 16:54:19 linuxwcc07 kernel: LustreError: 
11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1195055558, 100s ago)  req at ffff810201c61a00 x491921/t0 
o36->lustre-MDT0000_UUID at 137.226.71.155@tcp:12 lens 5864/296 ref 1 fl 
Rpc:/0/0 rc 0/-22
Nov 14 16:54:19 linuxwcc07 kernel: LustreError: 
11872:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at 
1195055558, 100s ago)  req at ffff810201c61a00 x491921/t0 
o36->lustre-MDT0000_UUID at 137.226.71.155@tcp:12 lens 5864/296 ref 1 fl 
Rpc:/0/0 rc 0/-22
Nov 14 16:54:19 linuxwcc07 kernel: Lustre: 
lustre-MDT0000-mdc-ffff81021adedc00: Connection to service 
lustre-MDT0000 via nid 137.226.71.155 at tcp was lost; in progress 
operations using this service will wait for recovery to complete.
Nov 14 16:54:19 linuxwcc07 kernel: Lustre: 
lustre-MDT0000-mdc-ffff81021adedc00: Connection to service 
lustre-MDT0000 via nid 137.226.71.155 at tcp was lost; in progress 
operations using this service will wait for recovery to complete.
Nov 14 16:54:19 linuxwcc07 kernel: Lustre: 
lustre-MDT0000-mdc-ffff81021adedc00: Connection restored to service 
lustre-MDT0000 using nid 137.226.71.155 at tcp.

The corresponding messages on the MDS:
Nov 14 16:52:38 linuxwcc05 kernel: LustreError: 
7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 
12345-137.226.71.157 at tcp, match 491921 length 5864 too big: 7416 left, 
5120 allowed
Nov 14 16:52:38 linuxwcc05 kernel: LustreError: 
7483:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 
12345-137.226.71.157 at tcp, match 491921 length 5864 too big: 7416 left, 
5120 allowed
Nov 14 16:54:19 linuxwcc05 kernel: Lustre: 
7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000: 
ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting
Nov 14 16:54:19 linuxwcc05 kernel: Lustre: 
7606:0:(ldlm_lib.c:514:target_handle_reconnect()) lustre-MDT0000: 
ec82c01d-f203-81b7-ed36-e0f0cf3b3f32 reconnecting

Is this a known issue?

Regards,
Hans Schnitzer

-- 
Hans-Juergen Schnitzer
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kommunikationszentrum
Seffenter Weg 23, 52074 Aachen (Germany)
Tel.: + 49(0)241/80-28719 - Fax: + 49(0)241/80-628719
schnitzer at rz.rwth-aachen.de
http://www.rz.rwth-aachen.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5737 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20071114/260cda1e/attachment.bin>


More information about the lustre-discuss mailing list