[Lustre-discuss] migrate lustre filesystem from 1.6.5.1 to 1.8.3

Philippe Weill philippe.Weill at latmos.ipsl.fr
Mon May 10 00:39:21 PDT 2010


Hi,

Is there special consideration before migrate from 1.6.5.1 to 1.8.3
1 mgs 2 filesystem 3 oss 12 ost 80T ( we need now 16T ost )

we migrate just 1 client for test to see how it's comporting
and I have some strange issue

1 on this client users don't have acces any more to their quota
---------------------------------------------------------------

lfs quota -v -u weill /home
Disk quotas for user weill (uid 1001):
      Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
           /home     [0]     [0]     [0]             [0]     [0]     [0]
quotactl failed: Operation not permitted
homefs-OST0000_UUID quotactl failed: Operation not permitted
Some errors happened when getting quota info. Some devices may be not working or deactivated. The 
data in "[]" is inaccurate.

from  root it's working

lfs quota -u weill /home
Disk quotas for user weill (uid 1001):
      Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
           /home 1646536  5000000 5100000           16628       0       0

2 regular error -16 only on the migrated node
---------------------------------------------

May 10 07:15:34 ciclad12 kernel: Lustre: 4202:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ 
Request x1333645653270608 sent from datafs-OST000a-osc-ffff810c1b6a0c00 to NID 172.20.176.131 at tcp 7s 
ago has timed out (7s prior to deadline).
May 10 07:15:34 ciclad-io2 kernel: Lustre: 7178:0:(ldlm_lib.c:525:target_handle_reconnect()) 
datafs-OST000a: 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 reconnecting
May 10 07:15:34 ciclad12 kernel: Lustre: datafs-OST000a-osc-ffff810c1b6a0c00: Connection to service 
datafs-OST000a via nid 172.20.176.131 at tcp was lost; in progress operations using this service will 
wait for recovery to complete.
May 10 07:15:34 ciclad12 kernel: Lustre: 4202:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ 
Request x1333645653270610 sent from datafs-OST000a-osc-ffff810c1b6a0c00 to NID 172.20.176.131 at tcp 7s 
ago has timed out (7s prior to deadline).
May 10 07:15:34 ciclad-io2 kernel: Lustre: 7178:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 
6776 previous similar messages
May 10 07:15:34 ciclad-io2 kernel: Lustre: 7178:0:(ldlm_lib.c:760:target_handle_connect()) 
datafs-OST000a: refuse reconnection from 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at 172.20.176.242@tcp to 
0xffff8101da55a000; still busy with 5 active RPCs
May 10 07:15:34 ciclad-io2 kernel: Lustre: 7178:0:(ldlm_lib.c:760:target_handle_connect()) Skipped 
6775 previous similar messages
May 10 07:15:34 ciclad-io2 kernel: LustreError: 7178:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ 
processing error (-16)  req at ffff8101bab43400 x1333645653270613/t0 
o8->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 368/200 e 0 to 0 dl 
1273468734 ref 1 fl Interpret:/0/0 rc -16/0
May 10 07:15:34 ciclad-io2 kernel: LustreError: 7178:0:(ldlm_lib.c:1536:target_send_reply_msg()) 
Skipped 6775 previous similar messages
May 10 07:15:34 ciclad12 kernel: LustreError: 11-0: an error occurred while communicating with 
172.20.176.131 at tcp. The ost_connect operation failed with -16
May 10 07:15:34 ciclad12 kernel: LustreError: Skipped 778 previous similar messages
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7253:0:(service.c:1064:ptlrpc_server_handle_request()) 
@@@ Request x1333645653270608 took longer than estimated (6+2s); client may timeout. 
req at ffff81022f851000 x1333645653270608/t54502752 
o4->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 448/352 e 0 to 0 dl 
1273468533 ref 1 fl Complete:/0/0 rc 0/0
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7253:0:(service.c:1064:ptlrpc_server_handle_request()) 
Skipped 1 previous similar message
May 10 07:15:35 ciclad12 kernel: LustreError: 11-0: an error occurred while communicating with 
172.20.176.131 at tcp. The ost_connect operation failed with -16
May 10 07:15:35 ciclad12 kernel: LustreError: Skipped 596 previous similar messages
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7172:0:(ldlm_lib.c:525:target_handle_reconnect()) 
datafs-OST000a: 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 reconnecting
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7172:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 
1364 previous similar messages
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7172:0:(ldlm_lib.c:760:target_handle_connect()) 
datafs-OST000a: refuse reconnection from 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at 172.20.176.242@tcp to 
0xffff8101da55a000; still busy with 4 active RPCs
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7172:0:(ldlm_lib.c:760:target_handle_connect()) Skipped 
1364 previous similar messages
May 10 07:15:35 ciclad-io2 kernel: LustreError: 7172:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ 
processing error (-16)  req at ffff81022f851200 x1333645653271978/t0 
o8->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 368/200 e 0 to 0 dl 
1273468735 ref 1 fl Interpret:/0/0 rc -16/0
May 10 07:15:35 ciclad-io2 kernel: LustreError: 7172:0:(ldlm_lib.c:1536:target_send_reply_msg()) 
Skipped 1364 previous similar messages
May 10 07:15:35 ciclad12 kernel: LustreError: 11-0: an error occurred while communicating with 
172.20.176.131 at tcp. The ost_connect operation failed with -16
May 10 07:15:35 ciclad12 kernel: LustreError: Skipped 2918 previous similar messages
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7160:0:(ldlm_lib.c:525:target_handle_reconnect()) 
datafs-OST000a: 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 reconnecting
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7160:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 
2607 previous similar messages
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7160:0:(ldlm_lib.c:760:target_handle_connect()) 
datafs-OST000a: refuse reconnection from 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at 172.20.176.242@tcp to 
0xffff8101da55a000; still busy with 4 active RPCs
May 10 07:15:35 ciclad-io2 kernel: Lustre: 7160:0:(ldlm_lib.c:760:target_handle_connect()) Skipped 
2607 previous similar messages
May 10 07:15:35 ciclad-io2 kernel: LustreError: 7160:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ 
processing error (-16)  req at ffff81022e05c400 x1333645653274586/t0 
o8->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 368/200 e 0 to 0 dl 
1273468735 ref 1 fl Interpret:/0/0 rc -16/0
May 10 07:15:35 ciclad-io2 kernel: LustreError: 7160:0:(ldlm_lib.c:1536:target_send_reply_msg()) 
Skipped 2607 previous similar messages
May 10 07:15:36 ciclad-io2 kernel: Lustre: 7231:0:(service.c:1064:ptlrpc_server_handle_request()) 
@@@ Request x1333645653270612 took longer than estimated (6+3s); client may timeout. 
req at ffff810139315600 x1333645653270612/t54502757 
o4->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 448/352 e 0 to 0 dl 
1273468533 ref 1 fl Complete:/0/0 rc 0/0
May 10 07:15:36 ciclad12 kernel: LustreError: 11-0: an error occurred while communicating with 
172.20.176.131 at tcp. The ost_connect operation failed with -16
May 10 07:15:36 ciclad12 kernel: LustreError: Skipped 4443 previous similar messages
May 10 07:15:36 ciclad-io2 kernel: Lustre: 7180:0:(ldlm_lib.c:525:target_handle_reconnect()) 
datafs-OST000a: 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 reconnecting
May 10 07:15:36 ciclad-io2 kernel: Lustre: 7180:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 
4627 previous similar messages
May 10 07:15:36 ciclad-io2 kernel: Lustre: 7180:0:(ldlm_lib.c:760:target_handle_connect()) 
datafs-OST000a: refuse reconnection from 15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at 172.20.176.242@tcp to 
0xffff8101da55a000; still busy with 2 active RPCs
May 10 07:15:36 ciclad-io2 kernel: Lustre: 7180:0:(ldlm_lib.c:760:target_handle_connect()) Skipped 
4627 previous similar messages
May 10 07:15:36 ciclad-io2 kernel: LustreError: 7180:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ 
processing error (-16)  req at ffff810068c24a00 x1333645653279214/t0 
o8->15f8d8bb-7b73-bcb3-c3bc-2b03195a9360 at NET_0x20000869db0f2_UUID:0/0 lens 368/200 e 0 to 0 dl 
1273468736 ref 1 fl Interpret:/0/0 rc -16/0
May 10 07:15:36 ciclad-io2 kernel: LustreError: 7180:0:(ldlm_lib.c:1536:target_send_reply_msg()) 
Skipped 4627 previous similar messages
May 10 07:15:36 ciclad12 kernel: Lustre: datafs-OST000a-osc-ffff810c1b6a0c00: Connection restored to 
service datafs-OST000a using nid 172.20.176.131 at tcp.



-- 
  Weill Philippe -  Administrateur Systeme et Reseaux
  CNRS/UPMC/IPSL   LATMOS (UMR 8190)



More information about the lustre-discuss mailing list