[Lustre-devel] SeaStar message priority
Oleg.Drokin at Sun.COM
Wed Apr 1 19:46:08 PDT 2009
On Apr 1, 2009, at 4:17 PM, Oleg Drokin wrote:
>>>> when, scheduling occurs on two nodes is different. Any two nodes,
>>>> running the same app with barrier synchronization, perform things
>>>> different times outside of the barriers; They very quickly
>>>> in the presence of jitter.
>>> But since the only thing I have in my app inside barriers is write
>>> there is no much way to desynchronize.
>> Modify your test to report the length of time each node spent in the
>> barrier (not just rank 0, as it is written now) immediately after the
>> write call, then? If you are correct, they will all be roughly the
>> If they have desynchronized, most will have very long wait times
>> but at
>> least one will be relatively short.
> That's a fair point. I just scheduled the run.
The results are in. I scheduled 2 runs. One at 4 threads/node and one
at 1 thread/node.
For the 4 threads/node case the 1st barrier took anywhere from 1.497
3.025 sec with rank 0 reporting 1.627 sec.
The second barrier took 0.916 to 2.758 seconds with rank 0 reporting
For the barrier 2 I can actually clearly observe that thread terminate
groups of 4 with very close times, and ranks suggest those nids are on
nodes. On 1st barrier this trend is much less visible, though.
On the 1 thread/node case the fastest 1st barrier was 7.515 seconds and
slowest was 10.176
For the 2nd barrier, fastest was 0.085 and slowest 2.756 which is
to the difference between fastest and slowest 1st barrier, since
amount of data
written per node in this case 4 smaller, I guess we just flushed all
to the disk before the 1st barrier finished and the difference in
waiting was due
to the differences in start times.
As you can see, numbers tend to jump around, but there are still
relatively big delays
due to something else than just threads getting out of sync.
More information about the lustre-devel