[Lustre-discuss] Errors on mounting clients

Patrice Hamelin patrice.hamelin at ec.gc.ca
Thu Dec 22 08:03:18 PST 2011


Hi,

   Me again!  :-)

I am getting errors before being able to mount clients.

On o2ib networks, I tried 3 or 4 clients which behave the same: error on 
first mount and second try mounts.

ib3-bc3e41-be02:~# mount -t lustre ib3-st01s at o2ib3:ib3-st02s at o2ib3:/sata 
/mnt/sata
mount.lustre: mount ib3-st01s at o2ib3:ib3-st02s at o2ib3:/sata at /mnt/sata 
failed: Cannot send after transport endpoint shutdown
ib3-bc3e41-be02:~# mount -t lustre ib3-st01s at o2ib3:ib3-st02s at o2ib3:/sata 
/mnt/sata

Generating a network error in the logs:

Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370607] Lustre: 
2671:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request 
x1388634652055195 sent from MGC10.10.135.115 at o2ib3 to NID 
10.10.135.115 at o2ib3 0s ago has failed due to network error (5s prior to 
deadline).
Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370611]   
req at ffff8804917c4c00 x1388634652055195/t0 
o250->MGS at MGC10.10.135.115@o2ib3_0:26/25 lens 368/584 e 0 to 1 dl 
1324569206 ref 1 fl Rpc:N/0/0 rc 0/0
Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370617] Lustre: 
2671:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 108 previous 
similar messages

   On my TCP clients, it is a little bit different:

ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata 
failed: Cannot send after transport endpoint shutdown
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata 
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata 
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata 
failed: File exists
.
.
.
   Then finally mounts after several tries.

Log files shows a network error once again:

Dec 22 15:59:43 ib4-bc1f82-be13 kernel: [172481.077865] Lustre: 
MGC10.10.132.115 at tcp: Reactivating import
Dec 22 15:59:43 ib4-bc1f82-be13 kernel: [172481.087738] Lustre: 
3057:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request 
x1388731500687365 sent from sata-OST0004-osc-ffff88096cd98000 to NID 
10.10.132.111 at tcp 0s ago has failed due to network error (5s prior to 
deadline).

I know it is related to network, but my own network works just fine.  
What about lnet?  How can I explain/eliminate that problem?


Thanks!
Greetings!



-- 
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada




More information about the lustre-discuss mailing list