[Lustre-devel] Lustre RPC visualization

Michael Kluge Michael.Kluge at tu-dresden.de
Sun May 16 22:53:22 PDT 2010


Hi Andrew,

unfortunately no. We don't own a Cray :( 


Regards, Michael


Am Sonntag, den 16.05.2010, 20:24 -0700 schrieb Andrew Uselton:
> I think this work is very interesting.  Will anyone be at CUG 2010
> next week to discuss? 
> Cheers,
> Andrew
> 
> 
> 2010/5/16 Michael Kluge <Michael.Kluge at tu-dresden.de>
>         Hi WangDi,
>         
>         the first version works. Screenshot is attached. I have a
>         couple of counter realized: RPC's in flight and RPC's
>         completed in total on the client, RPC's enqueued, RPC's in
>         processing and RPC'c completed in total on the server. All
>         these counter can be broken down by the type of RPC (op code).
>         The picture has not yet the lines that show each single RPC, I
>         still have to do counter like "avg. time to complete an RPC
>         over the last second" and there are some more TODO's. Like the
>         timer synchronization. (In the screenshot the first and the
>         last counter show total values while the one in the middle
>         shows a rate.)
>         
>         What I like to have is a complete set of traces from a small
>         cluster (<100 nodes) including the servers. Would that be
>         possible?
>         
>         Is one of you in Hamburg May, 31-June, 3 for ISC'2010? I'll be
>         there and like to talk about what would be useful for the next
>         steps. 
>         
>         
>         
>         Regards, Michael
>         
>         Am 03.05.2010 21:52, schrieb di.wang:
>         
>                 Michael Kluge wrote: 
>                 
>                 
>                                         One more question: RPC
>                                         1334380768266400 (in the log
>                                         WangDi sent me)
>                                         has on the client side only a
>                                         "Sending RPC" message, thus
>                                         missing the
>                                         "Completed RPC". The server
>                                         has all three (received,start
>                                         work, done
>                                         work). Has this RPC vanished
>                                         on the way back to the client?
>                                         There is
>                                         no further indication what
>                                         happend. The last timestamp in
>                                         the client
>                                         log is:
>                                         1272565368.228628
>                                         and the server says it
>                                         finished the processing of the
>                                         request at:
>                                         1272565281.379471
>                                         So the client log has been
>                                         recorded long enough to
>                                         contain the
>                                         "Completed RPC" message for
>                                         this RPC if it arrived
>                                         ever ...
>                                 Logically, yes. But in some cases,
>                                 some debug logs might be abandoned
>                                 for some reasons(actually, it happens
>                                 not rarely), and probably you need
>                                 maintain an average time from server
>                                 "Handled RPC" to client "Completed
>                                 RPC", then you just guess the client
>                                 "Completed RPC" time in this case.
>                         
>                         Oh my gosh ;) I don't want to start
>                         speculations about the helpfulness
>                         of incomplete debug logs. Anyway, what can get
>                         lost? Any kind of
>                         message on the servers and clients? I think
>                         I'd like to know what
>                         cases have to be handled while I try to track
>                         individual RPC's on
>                         their way.
>                 Any records can get lost here. Unfortunately, there
>                 are not any messages
>                 indicate the missing happened. :(
>                 (Usually, I would check the time stamp in the log,
>                 i.e. no records for a
>                 "long" time, for example several seconds, but this is
>                 not the accurate
>                 way).
>                 
>                 I guess you can just ignore these uncompleted records
>                 in your first
>                 step? Let's see how these incomplete log will
>                 impact the profiling result, then we will decide how
>                 to deal with this?
>                 
>                 Thanks
>                 Wangdi
>                         
>                         Regards, Michael
>                         _______________________________________________
>                         Lustre-devel mailing list
>                         Lustre-devel at lists.lustre.org
>                         http://lists.lustre.org/mailman/listinfo/lustre-devel
>                 
>                 
>                 
>         
>         
>         -- 
>         Michael Kluge, M.Sc.
>         
>         Technische Universität Dresden
>         Center for Information Services and
>         High Performance Computing (ZIH)
>         D-01062 Dresden
>         Germany
>         
>         Contact:
>         Willersbau, Room WIL A 208
>         Phone:  (+49) 351 463-34217
>         Fax:    (+49) 351 463-37773
>         e-mail: michael.kluge at tu-dresden.de
>         
>         
>         WWW:    http://www.tu-dresden.de/zih
>         
>         
>         _______________________________________________
>         Lustre-devel mailing list
>         Lustre-devel at lists.lustre.org
>         http://lists.lustre.org/mailman/listinfo/lustre-devel
>         
> 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

-- 

Michael Kluge, M.Sc.

Technische Universität Dresden
Center for Information Services and
High Performance Computing (ZIH)
D-01062 Dresden
Germany

Contact:
Willersbau, Room A 208
Phone:  (+49) 351 463-34217
Fax:    (+49) 351 463-37773
e-mail: michael.kluge at tu-dresden.de
WWW:    http://www.tu-dresden.de/zih
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5997 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20100517/92de1f31/attachment.bin>


More information about the lustre-devel mailing list