[lustre-devel] [PATCH 11/31] lustre: lnet: Fix route hops print

NeilBrown neilb at suse.com
Tue Jul 31 19:50:29 PDT 2018


Hi Amir,
 I think I'm happy to let this slide.  I don't like magic numbers, but
 this one isn't important enough to justify the problems that might be
 caused by changing it.

 To answer your broader question:
> The code bases are already diverging significantly between the
> upstream client and the master repo, which makes porting features from
> master to upstream a difficult task. Do we have a strategy on how to
> deal with this? 

 The long term strategy is just to get the work done so that the client
 code - and then all the kernel code - can be deleted from the master
 repo and can live solely in Linux.  I know that is still quite a way
 away.

 Shorter term, there are no magic answers.  Yes it is difficult but it
 is far from impossible.  My plan has always been to get the code that
 is already in drivers/staging into a reasonable state, then start
 forward-porting patches from master.  If other people do some of the
 forward-porting, that just makes me happier.
 If you think there is too much churn in my lustre tree, then just
 provide patches based on some old commit - I'm quite happy to receive
 patches based on fairly old code, and to do the final steps of
 forward-porting/conflict resolution myself (I have lots of practice).

NeilBrown


On Wed, Aug 01 2018, Amir Shehata wrote:

> Hi Neil,
>
> This issue actually came up because of LU-6060, which changed the behavior for LLNL. The behavior then was changed again by LU-6851: https://jira.whamcloud.com/browse/LU-6851 (if you'd like more background)
>
> As a result of LU-6851 we were printing the unsigned value of -1. That's why we ended up printing it as -1, which is more bearable than just printing a large unsigned value.
>
> I'm not disagreeing that it'll be better to print a clearer value, "unknown" does sound like it relays the correct meaning. However, I've sometimes run into issues where changing user facing interfaces caused  problems to user scripts. Would this be the case here, I'm not 100% sure. We can always make the change and then wait for tickets to be opened.
>
> However, I think of more concern to me is that if we make changes like this to the upstreamed client, it's probably a good idea to also make them to the whamcloud repo as well, so as not to diverge the client and server (LNet is common between them). This particular case, might not be very significant, but other issues might come up that are of more significance. 
>
> The code bases are already diverging significantly between the upstream client and the master repo, which makes porting features from master to upstream a difficult task. Do we have a strategy on how to deal with this?
>
> thanks
> amir
> ________________________________________
> From: NeilBrown [neilb at suse.com]
> Sent: Tuesday, July 31, 2018 5:32 PM
> To: Amir Shehata; James Simmons; Andreas Dilger; Oleg Drokin
> Cc: Lustre Development List
> Subject: RE: [PATCH 11/31] lustre: lnet: Fix route hops print
>
> Hi Amir,
>  thanks for the background.
>
>  I had to chuckle at "0 being the highest", though I know that this
>  distortion is not something specific to lustre.
>
>  Your description seems to suggest that "-1" means "unknown" with an
>  implication that the number of hops might be high, and best not to take
>  the risk.
>
>  Your point about compatability with scripts has some validity, though
>  it is annoying to have to support such ugly interfaces indefinitely.
>  Are there really likely to be dependencies? lustre has only been
>  printing -1 since Feb last year when this patch went upstream.
>  That was presumably an abi change as it would have printed MAXINT-1
>  previously.  Did that cause any problems?
>
> Thanks,
> NeilBrown
>
>
> On Tue, Jul 31 2018, Amir Shehata wrote:
>
>> The way hop and priority work in the code is they serve to select the preferred route. If you have multiple gateways leading to the same destination, you select the one with the highest priority (0 being the highest), followed by selecting the one with the least number of hops. If you don't specify hops, then it's actually treated as the least favoured if there are other routes with hops specified. If hops and priority are equivalent between routes, then you select the one with the most credits available, if that's equivalent you select in round robin.
>>
>> In that sense hops and priority really serve the same purpose, select the preferred route. If it was up to me I would keep only one of them, but for historical reasons, both are kept.
>>
>> Therefore, I'm not sure if "unlimited" actually relays the correct interpretation of that value. Note there could be user scripts out there that are already parsing the output. So by changing the -1 you could break the scripts. Also changing that will create an inconsistency between the server and client.
>>
>> thanks
>> amir
>> ________________________________________
>> From: NeilBrown [neilb at suse.com]
>> Sent: Tuesday, July 31, 2018 3:38 PM
>> To: James Simmons; Andreas Dilger; Oleg Drokin
>> Cc: Lustre Development List; Amir Shehata; James Simmons
>> Subject: Re: [PATCH 11/31] lustre: lnet: Fix route hops print
>>
>> On Mon, Jul 30 2018, James Simmons wrote:
>>
>>> From: Amir Shehata <ashehata at whamcloud.com>
>>>
>>> The default number of hops for  a route is -1. This is
>>> currently being printed as %u. Change that to %d to
>>> make it print out properly.
>>
>> -1 hops???  I wish I could hop -1 times - it would be a good party
>> trick!!
>>
>> What does -1 mean?  Unlimited (just a guess).  If so, could we print
>> "unlimited"??
>>
>> I'm fine with having magic numbers in the code, but I don't like them to
>> leak out.
>>
>> NeilBrown
>>
>>>
>>> Signed-off-by: Amir Shehata <ashehata at whamcloud.com>
>>> WC-id: https://jira.whamcloud.com/browse/LU-9078
>>> Reviewed-on: https://review.whamcloud.com/25250
>>> Reviewed-by: Olaf Weber <olaf at sgi.com>
>>> Reviewed-by: Doug Oucharek <dougso at me.com>
>>> Reviewed-by: James Simmons <uja.ornl at yahoo.com>
>>> Reviewed-by: Oleg Drokin <green at whamcloud.com>
>>> Signed-off-by: James Simmons <jsimmons at infradead.org>
>>> ---
>>>  drivers/staging/lustre/lnet/lnet/router_proc.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/staging/lustre/lnet/lnet/router_proc.c b/drivers/staging/lustre/lnet/lnet/router_proc.c
>>> index 8856798..aa98ce5 100644
>>> --- a/drivers/staging/lustre/lnet/lnet/router_proc.c
>>> +++ b/drivers/staging/lustre/lnet/lnet/router_proc.c
>>> @@ -218,7 +218,7 @@ static int proc_lnet_routes(struct ctl_table *table, int write,
>>>                       int alive = lnet_is_route_alive(route);
>>>
>>>                       s += snprintf(s, tmpstr + tmpsiz - s,
>>> -                                   "%-8s %4u %8u %7s %s\n",
>>> +                                   "%-8s %4d %8u %7s %s\n",
>>>                                     libcfs_net2str(net), hops,
>>>                                     priority,
>>>                                     alive ? "up" : "down",
>>> --
>>> 1.8.3.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180801/69c9df16/attachment.sig>


More information about the lustre-devel mailing list