[Lustre-devel] imperative recovery

Fri Jan 9 20:44:36 PST 2009

On Jan 9, 2009, at 4:50 PM, Andreas Dilger wrote:

> On Jan 09, 2009  09:04 -0800, Robert Read wrote:
>> On Jan 9, 2009, at 07:27 , Nicholas Henke wrote:
>>> This would be a great enhancement for OSS failover or reboot, it is
>>> really the only way we'll get to recovery times under ~2.5 x  
>>> obd_timeout.
>>>
>>> I do think this will miss a significant case: combo MGS+MDS. A
>>> majority of our customers are deploying with this configuration.
>>> Perhaps exposing this mechanism on the clients via a /proc file
>>> would be enough - that way a failover framework
>>> could manually trigger the timeout and/or nid switching.
>>
>> Yes, exactly what I was thinking. Exposing this feature via proc (or
>> lctl) on the clients is the first step. It's has minimal impact,
>> requires no changes to the server, and should integrate well with
>> existing failover frameworks.  We also need to get the server to end
>> recovery sooner (without waiting for all the stale exports), but VBR
>> should help with that.
>
> Hey, wouldn't (essentially) "lctl --device $foo recover" do the trick
> today?

The main difference is we need to specify the nid to connect to. Also,  
since  lctl isn't always available we should do this with a /proc file  
(and  set_param), so something like this:

echo $new_ost_nid > /proc/fs/lustre/osc/OSC_FOO_01/target_nid

or

lctl set_param osc.osc_FOO_01.target_nid $new_ost_nid

robert

>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel