[Lustre-discuss] lnet_try_match_md(), Matching packet from length too big

Heiko Schröter schroete at iup.physik.uni-bremen.de
Thu Nov 5 01:26:51 PST 2009


Hello,

since a few days these messages pop up on a client and the lustre mount gets blocked. After a force unmount and then remount lustre everything seems to work fine for some minutes but than the error apeears again. Our system has approx. 140TB on 9 OSTs.
We use lustre via automount and everything was fine until these errors did occur.
On the mds there are no errors in the logs.
I can spot no networks errors using 'ping -s 20000 ...' et al.  'lctl ping' shows up ok.
We had a ldap crash a few days ago and the lustre system was down due to the UID requests of the mds.
Besides this the data on some OST have been moved and the OSTs were taken out of the system, reformated and put back (different RAID Level that is).
I did reboot the MDS, but that did not cure the problem.

What can cause such hangups ?
Are there fscks needed on the OSTs ?
Should we upgrade to 1.6.7 or 1.8.1.1 ?

Thanks and Regards
Heiko


lustre: 1.6.6
vanilla: 2.6.22.19

On the CLIENT:
Nov  5 10:01:42 dras2 Lustre: scia-MDT0000-mdc-ffff8101ee1c7000: Connection restored to service scia-MDT0000 using nid 192.168.16.122 at tcp.
Nov  5 10:01:42 dras2 Lustre: Skipped 6 previous similar messages
Nov  5 10:05:02 dras2 Lustre: Request x3223875 sent from scia-MDT0000-mdc-ffff8101ee1c7000 to NID 192.168.16.122 at tcp 100s ago has timed out (limit 100s).
Nov  5 10:05:02 dras2 Lustre: Skipped 6 previous similar messages
Nov  5 10:05:02 dras2 Lustre: scia-MDT0000-mdc-ffff8101ee1c7000: Connection to service scia-MDT0000 via nid 192.168.16.122 at tcp was lost; in progress operations using this service will wait for recovery to complete.
Nov  5 10:05:02 dras2 Lustre: Skipped 6 previous similar messages
Nov  5 10:08:22 dras2 LustreError: 5027:0:(lib-move.c:111:lnet_try_match_md()) Matching packet from 12345-192.168.16.122 at tcp, match 3223875 length 1336 too big: 1272 left, 1272 allowed
Nov  5 10:08:22 dras2 LustreError: 5027:0:(lib-move.c:111:lnet_try_match_md()) Skipped 6 previous similar messages




More information about the lustre-discuss mailing list