[Lustre-discuss] RDMA limitation?

Tue Apr 13 20:43:21 PDT 2010

It's a kind of story like: "if you have to take dozens of global locks 
on lifetime of a RPC, then the code can't scale well on large SMP 
system, not matter what kind of network you are using”, so the problem 
is scattered everywhere.
Also, we are trying to reduce RPC bounce between CPUs, in current code, 
a request can be received by CPU A, then queued on CPU B, processed by 
CPU C, and replied by CPU D, it's very bad on large SMP system because 
of data traffic between CPUs.

Regards
Liang

Jiahua wrote:
> You mean it is inherent in the code? Can you point me to the actual
> code if possible? I am just curious why. Any pointers or hints will be
> appreciated.
>
> Thanks,
> Jiahua
>
>
> On Tue, Apr 13, 2010 at 6:46 PM, Kevin Van Maren <Kevin.Vanmaren at sun.com> wrote:
>   
>> Yes, the RPC rate is limited by Lustre code locking to that rate, even with
>> rdma.
>>
>> Kevin
>>
>>
>> On Apr 13, 2010, at 5:08 PM, Jiahua <jiahua at gmail.com> wrote:
>>
>>     
>>> Hi all,
>>>
>>> This is kind of a followup question of the thread "One or two OSS, no
>>> difference?" last month. In that thread, Andreas stated:
>>>
>>> "There is work currently underway to improve the SMP scaling
>>> performance for the RPC handling layer in Lustre.  Currently that
>>> limits the delivered RPC rate to 10-15k/sec or so."
>>>
>>> My question is: is the limitation also applied to RDMA on IB? By SMP,
>>> I guess Andreas was talking about CPU, right? Since RDMA can bypass
>>> the host CPU, does it mean it can also bypass the limitation?
>>>
>>> Thanks,
>>> Jiahua
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>       
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>