[Lustre-devel] Lustre RPC visualization
Michael Kluge
Michael.Kluge at tu-dresden.de
Fri May 28 07:54:33 PDT 2010
Hi WangDi,
> Looks great! Just query, as you said, "All these counter can be broken
> down by the type of RPC (op code)" , you actually implemented that, but
> not shown in the attached picture?
Yes.
> And could you please also add "Server queued RPCs" over time ?
Already done.
One good news: The Feature that Vampir can show something like a heat
map (Eric asked about this) comes back with the release at ISC. It is
now called "performance radar". It can produce a heat map for a counter
and does some other things as well. I could send a picture around, but
need at first an bigger trace (more hosts generating traces in
parallel).
Regards, Michael
> Thanks
> WangDi
>
> Michael Kluge wrote:
> > Hi WangDi,
> >
> > so, for the moment I am done with what I promised. The work to be done
> > is mainly debugging with more input data sets. Screenshot of Vampir
> > showing the derived counter values for the RPC processing/queue times on
> > the server and the client is attached. Units for the values are either
> > microseconds or just a number.
> >
> >
> > Regards, Michael
> >
> > Am Sonntag, den 16.05.2010, 11:29 +0200 schrieb Michael Kluge:
> >
> >> Hi WangDi,
> >>
> >> the first version works. Screenshot is attached. I have a couple of
> >> counter realized: RPC's in flight and RPC's completed in total on the
> >> client, RPC's enqueued, RPC's in processing and RPC'c completed in total
> >> on the server. All these counter can be broken down by the type of RPC
> >> (op code). The picture has not yet the lines that show each single RPC,
> >> I still have to do counter like "avg. time to complete an RPC over the
> >> last second" and there are some more TODO's. Like the timer
> >> synchronization. (In the screenshot the first and the last counter show
> >> total values while the one in the middle shows a rate.)
> >>
> >> What I like to have is a complete set of traces from a small cluster
> >> (<100 nodes) including the servers. Would that be possible?
> >>
> >> Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be there and
> >> like to talk about what would be useful for the next steps.
> >>
> >>
> >> Regards, Michael
> >>
> >> Am 03.05.2010 21:52, schrieb di.wang:
> >>
> >>> Michael Kluge wrote:
> >>>
> >>>>>> One more question: RPC 1334380768266400 (in the log WangDi sent me)
> >>>>>> has on the client side only a "Sending RPC" message, thus missing the
> >>>>>> "Completed RPC". The server has all three (received,start work, done
> >>>>>> work). Has this RPC vanished on the way back to the client? There is
> >>>>>> no further indication what happend. The last timestamp in the client
> >>>>>> log is:
> >>>>>> 1272565368.228628
> >>>>>> and the server says it finished the processing of the request at:
> >>>>>> 1272565281.379471
> >>>>>> So the client log has been recorded long enough to contain the
> >>>>>> "Completed RPC" message for this RPC if it arrived ever ...
> >>>>>>
> >>>>> Logically, yes. But in some cases, some debug logs might be abandoned
> >>>>> for some reasons(actually, it happens not rarely), and probably you need
> >>>>> maintain an average time from server "Handled RPC" to client "Completed
> >>>>> RPC", then you just guess the client "Completed RPC" time in this case.
> >>>>>
> >>>> Oh my gosh ;) I don't want to start speculations about the helpfulness
> >>>> of incomplete debug logs. Anyway, what can get lost? Any kind of
> >>>> message on the servers and clients? I think I'd like to know what
> >>>> cases have to be handled while I try to track individual RPC's on
> >>>> their way.
> >>>>
> >>> Any records can get lost here. Unfortunately, there are not any messages
> >>> indicate the missing happened. :(
> >>> (Usually, I would check the time stamp in the log, i.e. no records for a
> >>> "long" time, for example several seconds, but this is not the accurate
> >>> way).
> >>>
> >>> I guess you can just ignore these uncompleted records in your first
> >>> step? Let's see how these incomplete log will
> >>> impact the profiling result, then we will decide how to deal with this?
> >>>
> >>> Thanks
> >>> Wangdi
> >>>
> >>>> Regards, Michael
> >>>> _______________________________________________
> >>>> Lustre-devel mailing list
> >>>> Lustre-devel at lists.lustre.org
> >>>> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>>>
> >>>
> >> _______________________________________________
> >> Lustre-devel mailing list
> >> Lustre-devel at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-devel
> >>
> >
> >
> >
> > ------------------------------------------------------------------------
> >
>
>
--
Michael Kluge, M.Sc.
Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany
Contact:
Willersbau, Room A 208
Phone: (+49) 351 463-34217
Fax: (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW: http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100528/3897d501/attachment.bin>
More information about the lustre-devel
mailing list