[Lustre-devel] Lustre RPC visualization

Michael Kluge Michael.Kluge at tu-dresden.de
Tue May 25 05:03:15 PDT 2010

Hi WangDi,

so, for the moment I am done with what I promised. The work to be done
is mainly debugging with more input data sets. Screenshot of Vampir
showing the derived counter values for the RPC processing/queue times on
the server and the client is attached. Units for the values are either
microseconds or just a number.

Regards, Michael

Am Sonntag, den 16.05.2010, 11:29 +0200 schrieb Michael Kluge: 
> Hi WangDi,
> the first version works. Screenshot is attached. I have a couple of 
> counter realized: RPC's in flight and RPC's completed in total on the 
> client, RPC's enqueued, RPC's in processing and RPC'c completed in total 
> on the server. All these counter can be broken down by the type of RPC 
> (op code). The picture has not yet the lines that show each single RPC, 
> I still have to do counter like "avg. time to complete an RPC over the 
> last second" and there are some more TODO's. Like the timer 
> synchronization. (In the screenshot the first and the last counter show 
> total values while the one in the middle shows a rate.)
> What I like to have is a complete set of traces from a small cluster 
> (<100 nodes) including the servers. Would that be possible?
> Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and 
> like to talk about what would be useful for the next steps.
> Regards, Michael
> Am 03.05.2010 21:52, schrieb di.wang:
> > Michael Kluge wrote:
> >>>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
> >>>> has on the client side only a "Sending RPC" message, thus missing the
> >>>> "Completed RPC". The server has all three (received,start work, done
> >>>> work). Has this RPC vanished on the way back to the client? There is
> >>>> no further indication what happend. The last timestamp in the client
> >>>> log is:
> >>>> 1272565368.228628
> >>>> and the server says it finished the processing of the request at:
> >>>> 1272565281.379471
> >>>> So the client log has been recorded long enough to contain the
> >>>> "Completed RPC" message for this RPC if it arrived ever ...
> >>> Logically, yes. But in some cases, some debug logs might be abandoned
> >>> for some reasons(actually, it happens not rarely), and probably you need
> >>> maintain an average time from server "Handled RPC" to client "Completed
> >>> RPC", then you just guess the client "Completed RPC" time in this case.
> >>
> >> Oh my gosh ;) I don't want to start speculations about the helpfulness
> >> of incomplete debug logs. Anyway, what can get lost? Any kind of
> >> message on the servers and clients? I think I'd like to know what
> >> cases have to be handled while I try to track individual RPC's on
> >> their way.
> > Any records can get lost here. Unfortunately, there are not any messages
> > indicate the missing happened. :(
> > (Usually, I would check the time stamp in the log, i.e. no records for a
> > "long" time, for example several seconds, but this is not the accurate
> > way).
> >
> > I guess you can just ignore these uncompleted records in your first
> > step? Let's see how these incomplete log will
> > impact the profiling result, then we will decide how to deal with this?
> >
> > Thanks
> > Wangdi
> >>
> >> Regards, Michael
> >> _______________________________________________
> >> Lustre-devel mailing list
> >> Lustre-devel at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >
> >
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel


Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden

Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lustre_rpc_trace.png
Type: image/png
Size: 146793 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100525/66fe7ab6/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100525/66fe7ab6/attachment.bin>

More information about the lustre-devel mailing list