[lustre-devel] [PATCH 30/73] staging/lustre: use 64-bit times for request times
Dilger, Andreas
andreas.dilger at intel.com
Mon Sep 28 15:09:36 PDT 2015
On 2015/09/28, 2:19 AM, "lustre-devel on behalf of Arnd Bergmann"
<lustre-devel-bounces at lists.lustre.org on behalf of arnd at arndb.de> wrote:
>On Monday 28 September 2015 01:09:18 Dilger, Andreas wrote:
>> On 2015/09/27, 10:45 PM, "green at linuxhacker.ru" <green at linuxhacker.ru>
>> wrote:
>> >diff --git a/drivers/staging/lustre/lustre/ptlrpc/events.c
>> >b/drivers/staging/lustre/lustre/ptlrpc/events.c
>> >index 53f6b62..afd869b 100644
>> >--- a/drivers/staging/lustre/lustre/ptlrpc/events.c
>> >+++ b/drivers/staging/lustre/lustre/ptlrpc/events.c
>> >@@ -246,7 +246,7 @@ static void ptlrpc_req_add_history(struct
>> >ptlrpc_service_part *svcpt,
>> > struct ptlrpc_request *req)
>> > {
>> > __u64 sec = req->rq_arrival_time.tv_sec;
>> >- __u32 usec = req->rq_arrival_time.tv_usec >> 4; /* usec / 16 */
>> >+ __u32 usec = req->rq_arrival_time.tv_nsec / NSEC_PER_USEC / 16; /*
>>usec
>> >/ 16 */
>>
>> This could just be written like:
>>
>> __u32 usec = req->rq_arrival_time.tv_nsec >> 14 /* nsec / 16384 */;
>>
>> since the main point of this calculation is to get a number that fits
>> into a 16-bit field to provide ordering for items in a trace log. It
>> doesn't have to be exactly "nsec / 16000", and it avoids the division.
>
>Ok, that wasn't clear from the original code, so I just moved the division
>from the do_gettimeofday() here to keep the data unchanged.
>
>With your change, the
>
>new_seq = (sec << REQS_SEC_SHIFT) |
> (usec << REQS_USEC_SHIFT) |
> (svcpt->scp_cpt < 0 ? 0 : svcpt->scp_cpt);
>
>calculation will get forward jumps once a second, but I guess that
>doesn't matter if it's only used for sequencing.
>
>The part that I had not noticed here is the y2106 overflow in the
>sequence number. If we change the logic, we should probably give
>a few more bits to the seconds, as well, or use monotonic time.
>
>> >diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c
>> >b/drivers/staging/lustre/lustre/ptlrpc/service.c
>> >index 40de622..28f57d7 100644
>> >--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
>> >+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
>> >@@ -1191,7 +1191,7 @@ static int ptlrpc_at_add_timed(struct
>> >ptlrpc_request *req)
>> > spin_lock(&svcpt->scp_at_lock);
>> > LASSERT(list_empty(&req->rq_timed_list));
>> >
>> >- index = (unsigned long)req->rq_deadline % array->paa_size;
>> >+ div_u64_rem(req->rq_deadline, array->paa_size, &index);
>>
>> Since this is just a round-robin index that advances once per second,
>> it doesn't matter at all whether the calculation is computed on the
>> 64-bit seconds or on the 32-bit seconds, so there isn't any need for
>> the more expensive div_u64_rem() call here at all. It is fine to
>> just truncate the seconds and then do the modulus on the 32-bit value.
>>
>> >@@ -1421,7 +1421,7 @@ static int ptlrpc_at_check_timed(struct
>> >ptlrpc_service_part *svcpt)
>> > server will take. Send early replies to everyone expiring soon. */
>> > INIT_LIST_HEAD(&work_list);
>> > deadline = -1;
>> >- index = (unsigned long)array->paa_deadline % array->paa_size;
>> >+ div_u64_rem(array->paa_deadline, array->paa_size, &index);
>>
>> Same here.
>
>I went back and forth on these. Initially I did just what you suggest
>here and added a (u32) cast on the deadline fields, but I could not
>convince myself that the backwards jump in 2038 is harmless. For all
>I can tell, array->paa_size is not normally a power-of-two number, so
>(0xffffffff % array->paa_size) and (0 % array->paa_size) are not
>neighboring indices. If you are sure that the index can be allowed to
>jump in 2038, we should do the simpler math and add a comment.
The jump should be largely harmless. This array is tracking the maximum
RPC service time over the past paa_size interval to avoid sending spurious
RPC retries if the server is already heavily loaded. If these values
are not expired in 100% accurate order the worst thing that can happen is
that some clients may resend their RPCs unnecessarily under heavy load.
Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
More information about the lustre-devel
mailing list