[Lustre-discuss] persistent client re-connect failure

Andreas Dilger adilger at whamcloud.com
Mon Mar 21 10:59:02 PDT 2011


On 2011-03-21, at 6:37 PM, Samuel Aparicio wrote:
> it was permanently inactivated on the mds ... strange that it should show up at all. the OST list is persistent through the history of additions/deletions ...?

Yes, the "removed" OST will continue to be listed (for good or bad).  It is really just "permanent deactivation".


> On Mar 21, 2011, at 3:29 AM, Larry wrote:
> 
>> If you *only* deactivate it on mds, then you can still see the ost on
>> client, just not to write on it anymore.
>> 
>> On Mon, Mar 21, 2011 at 11:49 AM, Samuel Aparicio <saparicio at bccrc.ca> wrote:
>>> Follow up to this posting. I notice on the client that lctl device_list
>>> reports the following:
>>> 
>>>  0 UP mgc MGC10.9.89.51 at tcp 5a76b5b6-82bf-2053-8c17-e68ffe552edc 5
>>>   1 UP lov lustre-clilov-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 4
>>>   2 UP mdc lustre-MDT0000-mdc-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 5
>>>   3 UP osc lustre-OST0000-osc-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 5
>>>   4 UP osc lustre-OST0001-osc-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 5
>>>   5 UP osc lustre-OST0002-osc-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 5
>>>   6 UP osc lustre-OST0003-osc-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 4
>>>   7 UP osc lustre-OST0004-osc-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 5
>>>   8 UP osc lustre-OST0005-osc-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 5
>>>   9 UP osc lustre-OST0006-osc-ffff8100459a9c00
>>> 6775de4c-6c29-9316-a715-3472233477d1 5
>>>  10 UP lov lustre-clilov-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 4
>>>  11 UP mdc lustre-MDT0000-mdc-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
>>>  12 UP osc lustre-OST0000-osc-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
>>>  13 UP osc lustre-OST0001-osc-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
>>>  14 UP osc lustre-OST0002-osc-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
>>>  15 UP osc lustre-OST0003-osc-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 4
>>>  16 UP osc lustre-OST0004-osc-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
>>>  17 UP osc lustre-OST0005-osc-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
>>>  18 UP osc lustre-OST0006-osc-ffff810c92f2b800
>>> 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
>>>  19 UP lov lustre-clilov-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 4
>>>  20 UP mdc lustre-MDT0000-mdc-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 5
>>>  21 UP osc lustre-OST0000-osc-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 5
>>>  22 UP osc lustre-OST0001-osc-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 5
>>>  23 UP osc lustre-OST0002-osc-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 5
>>>  24 UP osc lustre-OST0003-osc-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 4
>>>  25 UP osc lustre-OST0004-osc-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 5
>>>  26 UP osc lustre-OST0005-osc-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 5
>>>  27 UP osc lustre-OST0006-osc-ffff81047a45c000
>>> 6a3d5815-4851-31b0-9400-c8892e11dae4 5
>>> 
>>> However OST3 is non-existent, it was de-activated on the MDS - why would the
>>> clients think it exists?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Professor Samuel Aparicio BM BCh PhD FRCPath
>>> Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
>>> 675 West 10th, Vancouver V5Z 1L3, Canada.
>>> office: +1 604 675 8200 lab website http://molonc.bccrc.ca
>>> 
>>> PLEASE SUPPORT MY FUNDRAISING FOR THE RIDE TO SEATTLE AND
>>> THE WEEKEND TO END WOMENS CANCERS. YOU CAN DONATE AT THE LINKS BELOW
>>> Ride to Seattle Fundraiser
>>> Weekend to End Womens Cancers
>>> 
>>> 
>>> 
>>> 
>>> On Mar 20, 2011, at 8:41 PM, Samuel Aparicio wrote:
>>> 
>>> I am stuck with the following issue on a client attached to a lustre system.
>>> we are running lustre 1.8.5
>>> somehow connectivity to the OST failed at some point and the mount hung.
>>> after unmounting and re-mounting the client attempts to reconnect.
>>> lctl ping shows the client to be connected and normal ping to the OSS/MGS
>>> servers shows connectivity.
>>> remounting the filesystem results in only some files being visible.
>>> the kernel messages are as follows:
>>> ---------
>>> Lustre: setting import lustre-OST0003_UUID INACTIVE by administrator request
>>> Lustre: lustre-OST0003-osc-ffff8110238c7400.osc: set parameter active=0
>>> Lustre: Skipped 3 previous similar messages
>>> LustreError: 14114:0:(lov_obd.c:315:lov_connect_obd()) not connecting OSC
>>> ^\; administratively disabled
>>> Lustre: Client lustre-client has started
>>> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc
>>> -5, returning -EIO
>>> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) Skipped 1 previous
>>> similar message
>>> LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc
>>> -5, returning -EIO
>>> LustreError: 14686:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc
>>> -5, returning -EIO
>>> Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
>>> x1363662012007464 sent from lustre-OST0000-osc-ffff8110238c7400 to NID
>>> 10.9.89.21 at tcp 16s ago has timed out (16s prior to deadline).
>>>   req at ffff810459ce4c00 x1363662012007464/t0
>>> o8->lustre-OST0000_UUID at 10.9.89.21@tcp:28/4 lens 368/584 e 0 to 1 dl
>>> 1300678232 ref 1 fl Rpc:N/0/0 rc 0/0
>>> Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 182
>>> previous similar messages
>>> Lustre: 22219:0:(import.c:517:import_select_connection())
>>> lustre-OST0000-osc-ffff8110238c7400: tried all connections, increasing
>>> latency to 18s
>>> Lustre: 22219:0:(import.c:517:import_select_connection()) Skipped 203
>>> previous similar messages
>>> ------------
>>> an LS of the filesytem shows
>>> drwxr-xr-x 4 amcpherson users 4096 Mar 19 10:38 amcpherson
>>> ?--------- ? ?          ?        ?            ? compute-2-0-testwrite
>>> ?--------- ? ?          ?        ?            ? hello
>>> ----------
>>> other clients on the system are able to mount and see the files perfectly
>>> well.
>>> can anyone help with what the errors above imply.
>>> a simple network connectivity issue does not seem to be the case here,
>>> yet the client attempts to re-connect to the OST, fail.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Professor Samuel Aparicio BM BCh PhD FRCPath
>>> Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
>>> 675 West 10th, Vancouver V5Z 1L3, Canada.
>>> office: +1 604 675 8200 lab website http://molonc.bccrc.ca
>>> 
>>> PLEASE SUPPORT MY FUNDRAISING FOR THE RIDE TO SEATTLE AND
>>> THE WEEKEND TO END WOMENS CANCERS. YOU CAN DONATE AT THE LINKS BELOW
>>> Ride to Seattle Fundraiser
>>> Weekend to End Womens Cancers
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>> 
>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>> 
>>> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.






More information about the lustre-discuss mailing list