[Lustre-devel] Lustre RPC visualization

Michael Kluge Michael.Kluge at tu-dresden.de
Wed Jun 16 01:46:06 PDT 2010


Hi Robert,

thanks for the trace files. Every single file contains log entries like:

line: 00000100:00100000:2.0F:1275433209.964416:0:7924:0:(events.c:285:request_in_callback())               incoming req at ffff8101b32f5400 x1337373369608379 msgsize 296

Are in the tarball 684 server logs? Or am I wrong with the assumption that the lines with an "incoming req@" can only show up on the servers?


Michael

Am 01.06.2010 um 21:39 schrieb Michael Kluge:

>> On 2010-06-01, at 06:12, di.wang wrote:
>>> Michael, I do not think you need all the trace logs from the clients. right?
>> 
>> Actually, I think he does want trace logs from all of the clients.
> 
> Yes, 600 is a good number. Vampir can easily handle this. If possible, I'd like to have all server traces as well to include this information. Right now I am only putting these RPC's in the trace where all 5 events (client send,server recv/start/done,client done) are present. I see the "client send/done" events in the OSS log as well, probably it is talking to the MDS or MGS. Events that are incomplete are being ignored but counted in "dumped events" counter. But that could be changed.
> 
> 
> Michael
> 
>>> I had thought the current target is to make sure vampire can handle
>>> the trace from "big" enough clusters.
>>> 
>>> Actually, I am not sure whether vampire(with this trace analyse) should 
>>> work in this way as we do now?  i.e. running job exclusively on the cluster, get rpctrace log, then get graph by vampire?
>>> 
>>> Or I miss sth here? Michale, could you please explain a bit? What is 
>>> your idea how vampire could help the end users? or the target here is 
>>> just helping the developers and sysadmin to understand the system?
>> 
>> My thoughts on this in the past were that it should be possible to use ONLY the client logs to plot syscall latencies (using client VFSTRACE lines) and the RPC latencies (using client RPCTRACE lines) and locking (using DLMTRACE lines).  Some code rework would be needed to allow regular users to run "lctl dk", and it would filter out the lines specific to their own processes.  This would allow regular users to collect and analyze their application without assistance or coordination from the sysadmin.
>> 
>> In order to allow regular users to trace such information, the VFSTRACE calls should report the UID of the process doing the system call, which would immediately map to a PID.  The PID can be used to track the majority of the debug entries, but not always those done in another thread (e.g. ptlrpcd).  The RPCTRACE messages also contain the PID, so that would be helpful, but it would mean that there has to be a parser/filter in the kernel to ensure users cannot access trace information that is not their own.  That would be a significant undertaking, I think.
>> 
>> In the meantime, for testing purposes and initial usage (with sysadmin assistance) the full debug log can be extracted from the kernel and filtered in userspace as needed.
>> 
>>>>> -----Original Message-----
>>>>> From: di.wang [mailto:di.wang at oracle.com]
>>>>> Sent: 01 June 2010 12:50 PM
>>>>> To: Robert Read
>>>>> Cc: Michael Kluge; Eric Barton; Galen M. Shipman
>>>>> Subject: Re: [Lustre-devel] Lustre RPC visualization
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> IMHO, just run IOR with whatever parameters, and get rpctrace
>>>>> log(probably only enable rpctrace) from 1 OST and some of clients
>>>>> (probably 2 is enough).
>>>>> Note: please make sure these 2 clients did communicate the OST during
>>>>> the IOR.
>>>>> 
>>>>> Michael, I do not think you need all the trace logs from the clients. right?
>>>>> 
>>>>> Robert
>>>>> 
>>>>> If there are available time slots for this test on Hyperion, who can
>>>>> help to get these logs?
>>>>> 
>>>>> Thanks
>>>>> Wangdi
>>>>> 
>>>>> Robert Read wrote:
>>>>> 
>>>>>> What should I run then? Do have scripts to capture this?
>>>>>> 
>>>>>> robert
>>>>>> 
>>>>>> On May 31, 2010, at 2:39 , Michael Kluge wrote:
>>>>>> 
>>>>>> 
>>>>>>> Hi Robert,
>>>>>>> 
>>>>>>> 600 is a nice number. Plus the traces from the server an I am happy.
>>>>>>> 
>>>>>>> 
>>>>>>> Michael
>>>>>>> 
>>>>>>> Am 28.05.2010 um 17:53 schrieb Robert Read:
>>>>>>> 
>>>>>>> 
>>>>>>>> On May 28, 2010, at 4:09 , di.wang wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Hello, Michael
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> One good news: The Feature that Vampir can show something like a heat
>>>>>>>>>> map (Eric asked about this) comes back with the release at ISC. It is
>>>>>>>>>> now called "performance radar". It can produce a heat map for a
>>>>>>>>>> counter
>>>>>>>>>> and does some other things as well. I could send a picture around, but
>>>>>>>>>> need at first an bigger trace (more hosts generating traces in
>>>>>>>>>> parallel).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> Right now I do not have big clusters available to generate the trace.
>>>>>>>>> I will see what I can do here.
>>>>>>>>> 
>>>>>>>> If ~600 clients is big enough we could generate that on Hyperion.
>>>>>>>> 
>>>>>>>> robert
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> WangDi
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Lustre-devel mailing list
>>>>>>>>>> Lustre-devel at lists.lustre.org <mailto:Lustre-devel at lists.lustre.org>
>>>>>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> Michael Kluge, M.Sc.
>>>>>>> 
>>>>>>> Technische Universität Dresden
>>>>>>> Center for Information Services and
>>>>>>> High Performance Computing (ZIH)
>>>>>>> D-01062 Dresden
>>>>>>> Germany
>>>>>>> 
>>>>>>> Contact:
>>>>>>> Willersbau, Room WIL A 208
>>>>>>> Phone:  (+49) 351 463-34217
>>>>>>> Fax:    (+49) 351 463-37773
>>>>>>> e-mail: michael.kluge at tu-dresden.de <mailto:michael.kluge at tu-dresden.de>
>>>>>>> WWW:    http://www.tu-dresden.de/zih
>>>>>>> 
>>>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Lustre-devel mailing list
>>>> Lustre-devel at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>>> 
>>> 
>>> _______________________________________________
>>> Lustre-devel mailing list
>>> Lustre-devel at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>> 
> 
> 
> -- 
> 
> Michael Kluge, M.Sc.
> 
> Technische Universität Dresden
> Center for Information Services and
> High Performance Computing (ZIH)
> D-01062 Dresden
> Germany
> 
> Contact:
> Willersbau, Room WIL A 208
> Phone:  (+49) 351 463-34217
> Fax:    (+49) 351 463-37773
> e-mail: michael.kluge at tu-dresden.de
> WWW:    http://www.tu-dresden.de/zih
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel


-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room WIL A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100616/4755505c/attachment.htm>


More information about the lustre-devel mailing list