[Lustre-discuss] RDMA limitation?

Wed Apr 14 21:23:43 PDT 2010

Jiahua wrote:
> Sorry to send it again! Can anyone help?
>
> Jiahua
>
>
> On Tue, Apr 13, 2010 at 10:45 PM, Jiahua <jiahua at gmail.com> wrote:
>   
>> Thanks for your answers! More questions:
>>
>> * Do you only lock for writes? What if I only read? Do you still lock
>> even for simultaneous reads?
>>     

"lock" here is synchronization of operating system, not dlm lock.

>> * Is the limitation system wide or just in one server? That is, can I
>> improve the performance by adding more OSS or OST?
>>     

SMP improvements are for performance of handling small RPCs, so it's 
mostly for metadata performance or I/O performance on NUMA system, it's 
about how to fully drive machines, not about scalability of whole cluster.

>> * By RPC bouncing, are you talking about the Linux storage stack? It
>> is not inherent to Lustre, right?
>>     

it is about lustre stack.

>> Thanks,
>> Jiahua
>>
>>
>> On Tue, Apr 13, 2010 at 8:43 PM, Liang Zhen <Zhen.Liang at sun.com> wrote:
>>     
>>> It's a kind of story like: "if you have to take dozens of global locks on
>>> lifetime of a RPC, then the code can't scale well on large SMP system, not
>>> matter what kind of network you are using”, so the problem is scattered
>>> everywhere.
>>> Also, we are trying to reduce RPC bounce between CPUs, in current code, a
>>> request can be received by CPU A, then queued on CPU B, processed by CPU C,
>>> and replied by CPU D, it's very bad on large SMP system because of data
>>> traffic between CPUs.
>>>
>>> Regards
>>> Liang
>>>
>>> Jiahua wrote:
>>>       
>>>> You mean it is inherent in the code? Can you point me to the actual
>>>> code if possible? I am just curious why. Any pointers or hints will be
>>>> appreciated.
>>>>
>>>> Thanks,
>>>> Jiahua
>>>>
>>>>
>>>> On Tue, Apr 13, 2010 at 6:46 PM, Kevin Van Maren <Kevin.Vanmaren at sun.com>
>>>> wrote:
>>>>
>>>>         
>>>>> Yes, the RPC rate is limited by Lustre code locking to that rate, even
>>>>> with
>>>>> rdma.
>>>>>
>>>>> Kevin
>>>>>
>>>>>
>>>>> On Apr 13, 2010, at 5:08 PM, Jiahua <jiahua at gmail.com> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Hi all,
>>>>>>
>>>>>> This is kind of a followup question of the thread "One or two OSS, no
>>>>>> difference?" last month. In that thread, Andreas stated:
>>>>>>
>>>>>> "There is work currently underway to improve the SMP scaling
>>>>>> performance for the RPC handling layer in Lustre.  Currently that
>>>>>> limits the delivered RPC rate to 10-15k/sec or so."
>>>>>>
>>>>>> My question is: is the limitation also applied to RDMA on IB? By SMP,
>>>>>> I guess Andreas was talking about CPU, right? Since RDMA can bypass
>>>>>> the host CPU, does it mean it can also bypass the limitation?
>>>>>>
>>>>>> Thanks,
>>>>>> Jiahua
>>>>>> _______________________________________________
>>>>>> Lustre-discuss mailing list
>>>>>> Lustre-discuss at lists.lustre.org
>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>>
>>>>>>             
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>         
>>>