[Lustre-discuss] Trying to mount lustre on a client when one or more OST is disabled

Bob Ball ball at umich.edu
Tue Dec 14 12:05:52 PST 2010


Well, you are absolutely right, it is a timeout talking to what it 
THINKS is the MDT.  The thing is, it is NOT!

We were set up for HA for the MDT, with 10.10.1.48 and 10.10.1.49 
watching and talking to one another.  The RedHat service was 
problematic, so right now 10.10.1.48 is the MDT, and has /mnt/mdt 
mounted, and 10.10.1.49 is being used to do backups, and has 
/mnt/mdt_snapshot mounted.  The actual volume is an iSCSI location.

So, somehow, the client node has found and is talking to the wrong 
host!  Not good.  Scary.  Got to do something about this.....

Suggestions appreciated....

bob

On 12/14/2010 11:57 AM, Andreas Dilger wrote:
> The error message shows a timeout connecting to umt3-MDT0000 and not the OST.  The operation 38 is MDS_CONNECT, AFAIK.
>
> Cheers, Andreas
>
> On 2010-12-14, at 9:19, Bob Ball<ball at umich.edu>  wrote:
>
>> I am trying to get a lustre client to mount the service, but with one or
>> more OST disabled.  This does not appear to be working.  Lustre version
>> is 1.8.4.
>>
>>   mount -o localflock,exclude=umt3-OST0019 -t lustre
>> 10.10.1.140 at tcp0:/umt3 /lustre/umt3
>>
>> dmesg on this client shows the following during the umount/mount sequence:
>>
>> Lustre: client ffff810c25c03800 umount complete
>> Lustre: Skipped 1 previous similar message
>> Lustre: MGC10.10.1.140 at tcp: Reactivating import
>> Lustre: 450250:0:(obd_mount.c:1786:lustre_check_exclusion()) Excluding
>> umt3-OST0019 (on exclusion list)
>> Lustre: 450250:0:(obd_mount.c:1786:lustre_check_exclusion()) Skipped 1
>> previous similar message
>> Lustre: 5942:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
>> x1354682302740498 sent from umt3-MDT0000-mdc-ffff810628209000 to NID
>> 10.10.1.49 at tcp 0s ago has failed due to network error (5s prior to
>> deadline).
>>    req at ffff810620e66400 x1354682302740498/t0
>> o38->umt3-MDT0000_UUID at 10.10.1.49@tcp:12/10 lens 368/584 e 0 to 1 dl
>> 1292342239 ref 1 fl Rpc:N/0/0 rc 0/0
>> Lustre: 5942:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1
>> previous similar message
>> Lustre: Client umt3-client has started
>>
>> When I check following the mount, using "lctl dl", I see the following,
>> and it is clear that the OST is active as far as this client is concerned.
>>
>>   19 UP osc umt3-OST0018-osc-ffff810628209000
>> 05b29472-d125-c36e-c023-e0eb76aaf353 5
>>   20 UP osc umt3-OST0019-osc-ffff810628209000
>> 05b29472-d125-c36e-c023-e0eb76aaf353 5
>>   21 UP osc umt3-OST001a-osc-ffff810628209000
>> 05b29472-d125-c36e-c023-e0eb76aaf353 5
>>
>> Two questions here.  The first, obviously, is what is wrong with this
>> picture?  Why can't I exclude this OST from activity on this client?  Is
>> it because the OSS serving that OST still has the OST active?  If the
>> OST were deactivated or otherwise unavailable on the OSS, would the
>> client mount then succeed to exclude this OST?  (OK, more than one
>> question in the group....)
>>
>> Second group, what is the correct syntax for excluding more that one
>> OST?  Is it a comma-separated list of exclusions, or are separate
>> excludes required?
>>
>>   mount -o localflock,exclude=umt3-OST0019,umt3-OST0020 -t lustre
>> 10.10.1.140 at tcp0:/umt3/lustre/umt3
>>                 or
>>   mount -o localflock,exclude=umt3-OST0019,exclude=umt3-OST0020 -t
>> lustre 10.10.1.140 at tcp0:/umt3 /lustre/umt3
>>
>> Thanks,
>> bob
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>



More information about the lustre-discuss mailing list