[lustre-devel] [PATCH 30/73] staging/lustre: use 64-bit times for request times

Mon Sep 28 15:09:36 PDT 2015

On 2015/09/28, 2:19 AM, "lustre-devel on behalf of Arnd Bergmann"
<lustre-devel-bounces at lists.lustre.org on behalf of arnd at arndb.de> wrote:

>On Monday 28 September 2015 01:09:18 Dilger, Andreas wrote:
>> On 2015/09/27, 10:45 PM, "green at linuxhacker.ru" <green at linuxhacker.ru>
>> wrote:
>> >diff --git a/drivers/staging/lustre/lustre/ptlrpc/events.c
>> >b/drivers/staging/lustre/lustre/ptlrpc/events.c
>> >index 53f6b62..afd869b 100644
>> >--- a/drivers/staging/lustre/lustre/ptlrpc/events.c
>> >+++ b/drivers/staging/lustre/lustre/ptlrpc/events.c
>> >@@ -246,7 +246,7 @@ static void ptlrpc_req_add_history(struct
>> >ptlrpc_service_part *svcpt,
>> > 				   struct ptlrpc_request *req)
>> > {
>> > 	__u64 sec = req->rq_arrival_time.tv_sec;
>> >-	__u32 usec = req->rq_arrival_time.tv_usec >> 4; /* usec / 16 */
>> >+	__u32 usec = req->rq_arrival_time.tv_nsec / NSEC_PER_USEC / 16; /*
>>usec
>> >/ 16 */
>> 
>> This could just be written like:
>> 
>> 	__u32 usec = req->rq_arrival_time.tv_nsec >> 14 /* nsec / 16384 */;
>> 
>> since the main point of this calculation is to get a number that fits
>> into a 16-bit field to provide ordering for items in a trace log.  It
>> doesn't have to be exactly "nsec / 16000", and it avoids the division.
>
>Ok, that wasn't clear from the original code, so I just moved the division
>from the do_gettimeofday() here to keep the data unchanged.
>
>With your change, the
>
>new_seq = (sec << REQS_SEC_SHIFT) |
>                  (usec << REQS_USEC_SHIFT) |
>                  (svcpt->scp_cpt < 0 ? 0 : svcpt->scp_cpt);
>
>calculation will get forward jumps once a second, but I guess that
>doesn't matter if it's only used for sequencing.
>
>The part that I had not noticed here is the y2106 overflow in the
>sequence number. If we change the logic, we should probably give
>a few more bits to the seconds, as well, or use monotonic time.
>
>> >diff --git a/drivers/staging/lustre/lustre/ptlrpc/service.c
>> >b/drivers/staging/lustre/lustre/ptlrpc/service.c
>> >index 40de622..28f57d7 100644
>> >--- a/drivers/staging/lustre/lustre/ptlrpc/service.c
>> >+++ b/drivers/staging/lustre/lustre/ptlrpc/service.c
>> >@@ -1191,7 +1191,7 @@ static int ptlrpc_at_add_timed(struct
>> >ptlrpc_request *req)
>> > 	spin_lock(&svcpt->scp_at_lock);
>> > 	LASSERT(list_empty(&req->rq_timed_list));
>> > 
>> >-	index = (unsigned long)req->rq_deadline % array->paa_size;
>> >+	div_u64_rem(req->rq_deadline, array->paa_size, &index);
>> 
>> Since this is just a round-robin index that advances once per second,
>> it doesn't matter at all whether the calculation is computed on the
>> 64-bit seconds or on the 32-bit seconds, so there isn't any need for
>> the more expensive div_u64_rem() call here at all.  It is fine to
>> just truncate the seconds and then do the modulus on the 32-bit value.
>> 
>> >@@ -1421,7 +1421,7 @@ static int ptlrpc_at_check_timed(struct
>> >ptlrpc_service_part *svcpt)
>> > 	   server will take. Send early replies to everyone expiring soon. */
>> > 	INIT_LIST_HEAD(&work_list);
>> > 	deadline = -1;
>> >-	index = (unsigned long)array->paa_deadline % array->paa_size;
>> >+	div_u64_rem(array->paa_deadline, array->paa_size, &index);
>> 
>> Same here.
>
>I went back and forth on these. Initially I did just what you suggest
>here and added a (u32) cast on the deadline fields, but I could not
>convince myself that the backwards jump in 2038 is harmless. For all
>I can tell,  array->paa_size is not normally a power-of-two number, so
>(0xffffffff % array->paa_size) and (0 % array->paa_size) are not
>neighboring indices. If you are sure that the index can be allowed to
>jump in 2038, we should do the simpler math and add a comment.

The jump should be largely harmless.  This array is tracking the maximum
RPC service time over the past paa_size interval to avoid sending spurious
RPC retries if the server is already heavily loaded.  If these values
are not expired in 100% accurate order the worst thing that can happen is
that some clients may resend their RPCs unnecessarily under heavy load.

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division