[Lustre-discuss] persistent client re-connect failure

Samuel Aparicio saparicio at bccrc.ca
Mon Mar 21 10:39:46 PDT 2011


follow up - rebooting the client fixed this issue - I could not remove the kernel modules (lustre_rmmod) and restart lnet even though the filesystem was unmounted, presumably because there was still some transaction trying to be played out.
is there a better way to do this?

sam a.


On Mar 20, 2011, at 8:41 PM, Samuel Aparicio wrote:

> I am stuck with the following issue on a client attached to a lustre system.
> we are running lustre 1.8.5
> somehow connectivity to the OST failed at some point and the mount hung.
> after unmounting and re-mounting the client attempts to reconnect.
> lctl ping shows the client to be connected and normal ping to the OSS/MGS servers shows connectivity.
> 
> remounting the filesystem results in only some files being visible.
> the kernel messages are as follows:
> ---------
> Lustre: setting import lustre-OST0003_UUID INACTIVE by administrator request
> Lustre: lustre-OST0003-osc-ffff8110238c7400.osc: set parameter active=0
> Lustre: Skipped 3 previous similar messages
> LustreError: 14114:0:(lov_obd.c:315:lov_connect_obd()) not connecting OSC ^\; administratively disabled
> Lustre: Client lustre-client has started
> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) Skipped 1 previous similar message
> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
> LustreError: 14686:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
> Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1363662012007464 sent from lustre-OST0000-osc-ffff8110238c7400 to NID 10.9.89.21 at tcp 16s ago has timed out (16s prior to deadline).
>   req at ffff810459ce4c00 x1363662012007464/t0 o8->lustre-OST0000_UUID at 10.9.89.21@tcp:28/4 lens 368/584 e 0 to 1 dl 1300678232 ref 1 fl Rpc:N/0/0 rc 0/0
> Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 182 previous similar messages
> Lustre: 22219:0:(import.c:517:import_select_connection()) lustre-OST0000-osc-ffff8110238c7400: tried all connections, increasing latency to 18s
> Lustre: 22219:0:(import.c:517:import_select_connection()) Skipped 203 previous similar messages
> ------------
> 
> an LS of the filesytem shows
> 
> drwxr-xr-x 4 amcpherson users 4096 Mar 19 10:38 amcpherson
> ?--------- ? ?          ?        ?            ? compute-2-0-testwrite
> ?--------- ? ?          ?        ?            ? hello
> 
> ----------
> 
> other clients on the system are able to mount and see the files perfectly well.
> 
> can anyone help with what the errors above imply. 
> 
> a simple network connectivity issue does not seem to be the case here,
> yet the client attempts to re-connect to the OST, fail.
> 
> 
> 
> 
> 
> 
> 
> Professor Samuel Aparicio BM BCh PhD FRCPath
> Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
> 675 West 10th, Vancouver V5Z 1L3, Canada.
> office: +1 604 675 8200 lab website http://molonc.bccrc.ca
> 
> PLEASE SUPPORT MY FUNDRAISING FOR THE RIDE TO SEATTLE AND 
> THE WEEKEND TO END WOMENS CANCERS. YOU CAN DONATE AT THE LINKS BELOW
> Ride to Seattle Fundraiser
> Weekend to End Womens Cancers
> 
> 
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110321/8ba9f235/attachment.htm>


More information about the lustre-discuss mailing list