[Lustre-devel] Lustre RPC visualization

Michael Kluge Michael.Kluge at tu-dresden.de
Mon May 3 11:10:33 PDT 2010


Hi Andreas,

Am 03.05.2010 15:20, schrieb Andreas Dilger:
> On 2010-05-03, at 04:41, Michael Kluge wrote:
>> I have a small problem understanding the logical order of the events.
>> Here is one example from the logs where the client reports an RPC as
>> completed before the server says that it has finished the RPC handling.
>> The difference is about 1.5 ms.
>>
>> Is that due to the fact that the server waits until the client has
>> ack'ed the result of the RPC? Or are the clocks between the servers not
>> well synchronized? Or are the timestamps in the logfile sometimes not
>> correct (log message could not be flushed or whatever)?
>
> The timestamps on the log messages are only as accurate as the clocks on
 > the nodes.  Lustre can run without synchronized clocks, using the client
 > clock as the basis for the timestamps, but Lustre makes no effort to
 > synchronize the client clocks.  For that you need to use NTP (easiest)
>  or adjust the timestamps in the logs based on messages like this.

OK, so logically the "Completed RPC" on the client side is supposed to 
show up after the server has written his "Handled RPC" to its log (if 
clocks are snyc'ed perfectly).

One more question: RPC 1334380768266400 (in the log WangDi sent me) has 
on the client side only a "Sending RPC" message, thus missing the 
"Completed RPC". The server has all three (received,start work, done 
work). Has this RPC vanished on the way back to the client? There is no 
further indication what happend. The last timestamp in the client log is:
1272565368.228628
and the server says it finished the processing of the request at:
1272565281.379471
So the client log has been recorded long enough to contain the 
"Completed RPC" message for this RPC if it arrived ever ...


Regards, Michael


>> In the first case I don't have a chance to figure out when the server
>> has finished working on the RPC and is going to send out the result,
>> correct?
>>
>> ../debug_log/rpc_client.debug:00000100:00100000:1.0:1272565278.263430:0:3479:0:(client.c:1532:ptlrpc_check_set())
>> Completed RPC pname:cluuid:pid:xid:nid:opc
>> ptlrpcd-brw:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c:3479:1334380768266050:10.8.0.126 at tcp:4
>>
>> ../debug_log/rpc_oss.debug:00000100:00100000:1.0:1272565278.265039:0:7385:0:(service.c:1672:ptlrpc_server_handle_request())
>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
>> ll_ost_io_24:3f7723dc-06a3-0b5e-9e96-f5b86e8d1b1c+27:3479:x1334380768266050:12345-10.8.0.104 at tcp:4
>> Request procesed in 238391us (239698us total) trans 266 rc 0/0
>>
>>
>> For this RPC I see in the oss log:
>>
>> 1272565278.025337: incoming x1334380768266050
>> 1272565278.026649: Handling RPC x1334380768266050
>> 1272565278.265039: Handled RPC x1334380768266050:Request procesed in
>> 238391us (239698us total)
>>
>> So, by doing the math I get: 265039-026649=238390 (just 1 off target,
>> might be wathever) and 265039-025337=239702 (4 off target). So I guess
>> the times reported by the last message are calculated by using a
>> different data source than the log time stamps?
>
> The log timestamps are generated at the time the message is processed by the log system.  The processing times are themselves generated by accessing the clock and computing the processing before the message is printed.
>
>          do_gettimeofday(&work_start);
>          :
>          {handle RPC}
>          :
>          do_gettimeofday(&work_end);
>          timediff = cfs_timeval_sub(&work_end,&work_start, NULL);
>          CDEBUG(D_RPCTRACE, "request x"LPU64" opc %u from %s processed in "
>                 "%ldus (%ldus total) trans "LPU64" rc %d/%d\n",
>
>
>
>> Am 29.04.2010 20:27, schrieb di.wang:
>>> Hello, Michael
>>>
>>> Here is a small debug log example you can use for your development.
>>>
>>> Thanks
>>> WangDi
>>> Michael Kluge wrote:
>>>> Hi WangDi,
>>>>
>>>> OK, thanks for input. I'll go ahead and try to write a converter.
>>>> Could you please collect a set test traces that belong together from a
>>>> couple servers and clients and put them somewhere so that I can
>>>> download them?
>>>>
>>>>
>>>> Thanks, Michael
>>>>
>>>> Am 29.04.2010 03:25, schrieb di.wang:
>>>>> Hello, Michael
>>>>>
>>>>> There is a logfile parser script in the attachment, which was written by
>>>>> Eric.
>>>>>
>>>>> This script is very simple, but it should help you understand how we
>>>>> retrieve time step information from lustre debug log. On the server
>>>>> side, if you enable rpc_trace log, whenever the request arrive/start
>>>>> being processed/end of processing, there will be corresponding records
>>>>> being written into the debug log. Basically, you can get all the time
>>>>> step information from these records (actually only two of these records
>>>>> would be enough).
>>>>>
>>>>>
>>>>> a.
>>>>> 00000100:00100000:0.0:1272313858.472660:0:31581:0:(service.c:1625:ptlrpc_server_handle_request())
>>>>>
>>>>> Handling RPC pname:cluuid+ref:pid:xid:nid:opc
>>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+10:4055:x1334115493386242:12345-0 at lo:400
>>>>>
>>>>>
>>>>>
>>>>> This record means the req will being handled, so you can get the start
>>>>> time stamp(1272313858.472660) operation type (opc: 400, ping), xid
>>>>> (1334115493386242), client nid(12345-0 at lo) and so on.
>>>>>
>>>>> b.
>>>>> 00000100:00100000:0.0:1272313858.472687:0:31581:0:(service.c:1672:ptlrpc_server_handle_request())
>>>>>
>>>>> Handled RPC pname:cluuid+ref:pid:xid:nid:opc
>>>>> ll_mgs_00:7d4fb15c-1b1c-295f-e466-ea7d77089b52+9:4055:x1334115493386242:12345-0 at lo:400
>>>>>
>>>>> Request procesed in 45us (77us total) trans 0 rc 0/0
>>>>>
>>>>> This record means the req is already being handled, so you can get the
>>>>> end time stamp(1272313858.472687), operation type (opc: 400, ping), xid
>>>>> (1334115493386242), client nid(12345-0 at lo) and so no.
>>>>>
>>>>> Note: (77us total) means how long it takes from the request arriving to
>>>>> the end of processing. so you can also get the request arriving time
>>>>> stamp here by (1272313858 - 77 = 1272312781).
>>>>>
>>>>>
>>>>> So with these information you can draw the graph Eric mentioned in his
>>>>> email. If you have any questions, please let me know.
>>>>>
>>>>> Thanks
>>>>> WangDi
>>>>>
>>>>>
>>>>> Eric Barton wrote:
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> Subject:
>>>>>> RE: Visualising Lustre RPCs
>>>>>> To:
>>>>>> "Michael Kluge"<Michael.Kluge at tu-dresden.de>, "wangdi"
>>>>>> <Tom.Wang at Sun.COM>
>>>>>>
>>>>>> To:
>>>>>> "Michael Kluge"<Michael.Kluge at tu-dresden.de>, "wangdi"
>>>>>> <Tom.Wang at Sun.COM>
>>>>>> CC:
>>>>>> <lustre-devel at lists.lustre.org>, "Galen M. Shipman"<gshipman at ornl.gov>
>>>>>>
>>>>>>
>>>>>> Michael,
>>>>>>
>>>>>> The previous Lustre RPC visualisation effort I mentioned at the LUG
>>>>>> used the
>>>>>>
>>>>>> Lustre debug log entries of type D_RPCTRACE. We disabled all but these
>>>>>>
>>>>>> log messages and then used the Lustre debug daemon to collect them
>>>>>> while
>>>>>>
>>>>>> we ran I/O tests. We then ran a simple logfile parser which used just
>>>>>> the log entries
>>>>>>
>>>>>> for request arrival…
>>>>>>
>>>>>> , start of processing and end of processing to graph request
>>>>>>
>>>>>> queue depth (arrival->end) and the number of requests being serviced
>>>>>> by type
>>>>>>
>>>>>> over time – e.g…
>>>>>>
>>>>>> read3d
>>>>>>
>>>>>> …which shows request queue depth (vertical) over time (axis labelled
>>>>>> 20-25) by
>>>>>>
>>>>>> server (axis labelled 0-80).
>>>>>>
>>>>>> *From:* Michael Kluge [mailto:Michael.Kluge at tu-dresden.de]
>>>>>> *Sent:* 17 April 2010 6:26 AM
>>>>>> *To:* Galen M. Shipman; Eric Barton
>>>>>> *Subject:* Visualising Lustre RPCs
>>>>>>
>>>>>> Hi Galen, Eric,
>>>>>>
>>>>>> in order to get this little project started, I think what I need at
>>>>>> first to
>>>>>>
>>>>>> write a prototype for a converter are the following things:
>>>>>>
>>>>>> A set of test traces collected on maybe a handful of clients and some
>>>>>> servers
>>>>>>
>>>>>> is probably a good point to start with. It would be even better if we
>>>>>> know
>>>>>>
>>>>>> what is in this traces so that we have an expectation what kind of
>>>>>> things
>>>>>>
>>>>>> we want to see on the Vampir displays. This little program that Eric
>>>>>> mentioned
>>>>>>
>>>>>> that can read the trace file would be very helpful as well. And as the
>>>>>> last
>>>>>>
>>>>>> idea I have right now, a technical contact. I might come up with a
>>>>>> couple
>>>>>>
>>>>>> of question after I have taken the first look onto the original trace
>>>>>> data
>>>>>>
>>>>>> and before I start writing code.
>>>>>>
>>>>>> Regards, Michael
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Michael Kluge, M.Sc.
>>>>>>
>>>>>> Technische Universität Dresden
>>>>>> Center for Information Services and
>>>>>> High Performance Computing (ZIH)
>>>>>> D-01062 Dresden
>>>>>> Germany
>>>>>>
>>>>>> Contact:
>>>>>> Willersbau, Room WIL A 208
>>>>>> Phone: (+49) 351 463-34217
>>>>>> Fax: (+49) 351 463-37773
>>>>>> e-mail: michael.kluge at tu-dresden.de
>>>>>> <mailto:michael.kluge at tu-dresden.de>
>>>>>> WWW: http://www.tu-dresden.de/zih



More information about the lustre-devel mailing list