[Lustre-discuss] Errors on mounting clients
Patrice Hamelin
patrice.hamelin at ec.gc.ca
Thu Dec 22 08:03:18 PST 2011
Hi,
Me again! :-)
I am getting errors before being able to mount clients.
On o2ib networks, I tried 3 or 4 clients which behave the same: error on
first mount and second try mounts.
ib3-bc3e41-be02:~# mount -t lustre ib3-st01s at o2ib3:ib3-st02s at o2ib3:/sata
/mnt/sata
mount.lustre: mount ib3-st01s at o2ib3:ib3-st02s at o2ib3:/sata at /mnt/sata
failed: Cannot send after transport endpoint shutdown
ib3-bc3e41-be02:~# mount -t lustre ib3-st01s at o2ib3:ib3-st02s at o2ib3:/sata
/mnt/sata
Generating a network error in the logs:
Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370607] Lustre:
2671:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request
x1388634652055195 sent from MGC10.10.135.115 at o2ib3 to NID
10.10.135.115 at o2ib3 0s ago has failed due to network error (5s prior to
deadline).
Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370611]
req at ffff8804917c4c00 x1388634652055195/t0
o250->MGS at MGC10.10.135.115@o2ib3_0:26/25 lens 368/584 e 0 to 1 dl
1324569206 ref 1 fl Rpc:N/0/0 rc 0/0
Dec 22 15:53:21 ib3-bc3e41-be03 kernel: [263682.370617] Lustre:
2671:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 108 previous
similar messages
On my TCP clients, it is a little bit different:
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata
failed: Cannot send after transport endpoint shutdown
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata
failed: File exists
ib4-bc1f82-be13:~# mount -t lustre ib3-st01e at tcp:ib3-st02e at tcp:/sata
/mnt/sata
mount.lustre: mount ib3-st01e at tcp:ib3-st02e at tcp:/sata at /mnt/sata
failed: File exists
.
.
.
Then finally mounts after several tries.
Log files shows a network error once again:
Dec 22 15:59:43 ib4-bc1f82-be13 kernel: [172481.077865] Lustre:
MGC10.10.132.115 at tcp: Reactivating import
Dec 22 15:59:43 ib4-bc1f82-be13 kernel: [172481.087738] Lustre:
3057:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request
x1388731500687365 sent from sata-OST0004-osc-ffff88096cd98000 to NID
10.10.132.111 at tcp 0s ago has failed due to network error (5s prior to
deadline).
I know it is related to network, but my own network works just fine.
What about lnet? How can I explain/eliminate that problem?
Thanks!
Greetings!
--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada
More information about the lustre-discuss
mailing list