[Lustre-discuss] lustre automount mounting problem with listing dir content deeper than mountpoint

Heiko Schröter schroete at iup.physik.uni-bremen.de
Tue Nov 10 01:42:25 PST 2009


Hello,
after fixing the broken hardware stuff one problem remains.
We are using lustre with automount since over one year without problems.
Since the hardware failure a few days ago (a Gigabit switch and the SATA Backplane in one MDS) the following happens.

Client quadcore1. The lustre system is mounted under '/misc/data' (mountpoint) via automount.
'mount':
mds1 at tcp0:mds2 at tcp0:/scia on /misc/data type lustre (rw)

Now doing:
'umount /misc/data'
Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(ldlm_request.c:986:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(ldlm_request.c:1575:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(connection.c:144:ptlrpc_put_connection()) NULL connection
Nov 10 10:14:24 quadcore1 LustreError: 8751:0:(connection.c:144:ptlrpc_put_connection()) Skipped 13 previous similar messages
Nov 10 10:14:24 quadcore1 Lustre: client ffff81001606dc00 umount complete

Trying to automount and digging one or more Dirs deeper than the mountpoint (client console hangs after this command):
'ls -la /misc/data/OneDirDeeper'
Nov 10 10:14:33 quadcore1 automount[2797]: attempting to mount entry /misc/data
Nov 10 10:14:34 quadcore1 Lustre: Client scia-client has started
Nov 10 10:14:34 quadcore1 automount[2797]: mount(generic): mounted mds1 at tcp0:mds2 at tcp0:/scia type lustre on /misc/data
Nov 10 10:14:34 quadcore1 automount[2797]: mounted /misc/data
Nov 10 10:14:34 quadcore1 LustreError: 3115:0:(lib-move.c:95:lnet_try_match_md()) Matching packet from 12345-192.168.16.122 at tcp, match 86814 length 1336 too big: 1272 left, 1272 allowed
Nov 10 10:14:34 quadcore1 Lustre: 3115:0:(lib-move.c:1647:lnet_parse_put()) Dropping PUT from 12345-192.168.16.122 at tcp portal 10 match 86814 offset 128 length 1336: 2

In a new console (releasing the freezed console above):
'umount /misc/data':
Nov 10 10:15:38 quadcore1 Lustre: setting import scia-MDT0000_UUID INACTIVE by administrator request
Nov 10 10:15:38 quadcore1 Lustre: Skipped 13 previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 3122:0:(mdc_locks.c:841:mdc_intent_getattr_async_interpret()) ldlm_cli_enqueue_fini: -4
Nov 10 10:15:38 quadcore1 LustreError: 3122:0:(mdc_locks.c:841:mdc_intent_getattr_async_interpret()) Skipped 2 previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(client.c:716:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req at ffff8102244c6c00 x86895/t0 o101->scia-MDT0000_UUID at 192.168.16.122@tcp:12/10 lens 440/1400 e 0 to 100 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(client.c:716:ptlrpc_import_delay_req()) Skipped 2 previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(mdc_locks.c:586:mdc_enqueue()) ldlm_cli_enqueue: -108
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(mdc_locks.c:586:mdc_enqueue()) Skipped 76 previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:261:ll_get_dir_page()) lock enqueue: rc: -108
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:261:ll_get_dir_page()) Skipped 2 previous similar messages
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:415:ll_readdir()) error reading dir 4167519/1275738219 page 6: rc -108
Nov 10 10:15:38 quadcore1 LustreError: 8754:0:(dir.c:415:ll_readdir()) Skipped 2 previous similar messages

There are no messages on the MDS or OSTs related to this.
Doing an 'ls -la /misc/data' works ok and the lustre system gets mounted properly on /misc/data.
The above scenario is reproducable on all clients.
The system works fine when the lustre system is mounted statically or after the mount is done in a proper way.

lustre-1.6.6
vanilla-2.6.22.19

Thanks and Regards
Heiko



More information about the lustre-discuss mailing list