[Lustre-discuss] persistent client re-connect failure

Samuel Aparicio saparicio at bccrc.ca
Sun Mar 20 20:41:00 PDT 2011


I am stuck with the following issue on a client attached to a lustre system.
we are running lustre 1.8.5
somehow connectivity to the OST failed at some point and the mount hung.
after unmounting and re-mounting the client attempts to reconnect.
lctl ping shows the client to be connected and normal ping to the OSS/MGS servers shows connectivity.

remounting the filesystem results in only some files being visible.
the kernel messages are as follows:
---------
Lustre: setting import lustre-OST0003_UUID INACTIVE by administrator request
Lustre: lustre-OST0003-osc-ffff8110238c7400.osc: set parameter active=0
Lustre: Skipped 3 previous similar messages
LustreError: 14114:0:(lov_obd.c:315:lov_connect_obd()) not connecting OSC ^\; administratively disabled
Lustre: Client lustre-client has started
LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
LustreError: 14207:0:(file.c:995:ll_glimpse_size()) Skipped 1 previous similar message
LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
LustreError: 14686:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1363662012007464 sent from lustre-OST0000-osc-ffff8110238c7400 to NID 10.9.89.21 at tcp 16s ago has timed out (16s prior to deadline).
  req at ffff810459ce4c00 x1363662012007464/t0 o8->lustre-OST0000_UUID at 10.9.89.21@tcp:28/4 lens 368/584 e 0 to 1 dl 1300678232 ref 1 fl Rpc:N/0/0 rc 0/0
Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 182 previous similar messages
Lustre: 22219:0:(import.c:517:import_select_connection()) lustre-OST0000-osc-ffff8110238c7400: tried all connections, increasing latency to 18s
Lustre: 22219:0:(import.c:517:import_select_connection()) Skipped 203 previous similar messages
------------

an LS of the filesytem shows

drwxr-xr-x 4 amcpherson users 4096 Mar 19 10:38 amcpherson
?--------- ? ?          ?        ?            ? compute-2-0-testwrite
?--------- ? ?          ?        ?            ? hello

----------

other clients on the system are able to mount and see the files perfectly well.

can anyone help with what the errors above imply. 

a simple network connectivity issue does not seem to be the case here,
yet the client attempts to re-connect to the OST, fail.







Professor Samuel Aparicio BM BCh PhD FRCPath
Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
675 West 10th, Vancouver V5Z 1L3, Canada.
office: +1 604 675 8200 lab website http://molonc.bccrc.ca

PLEASE SUPPORT MY FUNDRAISING FOR THE RIDE TO SEATTLE AND 
THE WEEKEND TO END WOMENS CANCERS. YOU CAN DONATE AT THE LINKS BELOW
Ride to Seattle Fundraiser
Weekend to End Womens Cancers




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110320/1aadb1fd/attachment.htm>


More information about the lustre-discuss mailing list