[Lustre-discuss] Large directories optimization

Lukas Hejtmanek xhejtman at ics.muni.cz
Thu Sep 17 04:28:59 PDT 2009


Hello,

is it possible to optimize Lustre so that is supports really large directories
(with 30k small files in it)? We have 8 physical clients which process jpeg
files stored on Lustre volume and I get sooner or later client freezes - ls in
Lustre directory waits forever. I there something I could do to improve
performance?

The lustre server is Build Version:
1.8.0-19700101010000-PRISTINE-.usr.src.lustre-prod.linux-2.6.22.19-2.6.22.19

The lustre client is Build Version:
1.6.7.1-19700101010000-PRISTINE-.scratch.xhejtman.suse-2.6.22.17-0.1-2.6.22.17-0.1-xen-lustre

I got the following messages on the client:
Lustre: stable-MDT0000-mdc-ffff8802855b7800: Connection to service
stable-MDT0000 via nid x.x.x.x at tcp was lost; in progress operations using
this service will wait for recovery to complete.
Lustre: Skipped 2 previous similar messages
LustreError: 1445:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11
from cancel RPC: canceling anyway
LustreError: 1445:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped 37
previous similar messages
LustreError: 1445:0:(ldlm_request.c:1622:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11
LustreError: 1445:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) Skipped 37
previous similar messages
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 8s
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 13s
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 18s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
LustreError: 11-0: an error occurred while communicating with x.x.x.x at tcp.
The mds_connect operation failed with -16
Lustre: Request x112815827 sent from stable-OST0001-osc-ffff8802855b7800 to
NID x.x.x.x at tcp 100s ago has timed out (limit 100s).
Lustre: Skipped 9 previous similar messages
Lustre: stable-OST0001-osc-ffff8802855b7800: Connection to service
stable-OST0001 via nid x.x.x.x at tcp was lost; in progress operations using
this service will wait for recovery to complete.
Lustre: Skipped 1 previous similar message
LustreError: 128:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from
cancel RPC: canceling anyway
LustreError: 128:0:(ldlm_request.c:1622:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -11
Lustre: stable-OST0001-osc-ffff8802855b7800: Connection restored to service
stable-OST0001 using nid x.x.x.x at tcp.
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 23s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
LustreError: 166-1: MGCx.x.x.x at tcp: Connection to service MGS via nid
x.x.x.x at tcp was lost; in progress operations using this service will fail.
Lustre: MGCx.x.x.x at tcp: Reactivating import
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 28s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 33s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
LustreError: 11-0: an error occurred while communicating with x.x.x.x at tcp.
The mds_connect operation failed with -16
LustreError: Skipped 5 previous similar messages
Lustre: 3170:0:(import.c:507:import_select_connection())
stable-MDT0000-mdc-ffff8802855b7800: tried all connections, increasing latency
to 38s
Lustre: 3170:0:(import.c:507:import_select_connection()) Skipped 1 previous
similar message
LustreError: 3158:0:(events.c:66:request_out_callback()) @@@ type 4, status -5
req at ffff8801002dd800 x112816084/t0
o103->stable-OST0001_UUID at 10.0.0.1@o2ib:17/18 lens 648/256 e 0 to 1 dl
1253177767 ref 2 fl Rpc:N/0/0 rc 0/0
LustreError: 1470:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11
from cancel RPC: canceling anyway
LustreError: 1470:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Skipped 194
previous similar messages


-- 
Lukáš Hejtmánek



More information about the lustre-discuss mailing list