Hi Claudio,<br><br>When you say that during Linpack you see high system cpu usage, do you mean the cpu usage on the clients or the servers?<br>Can you run for example top command and see which processes take the most of the CPU time?<br>

<br>Cheers<br><br>Wojciech<br><br><div class="gmail_quote">On 15 January 2011 11:18, Claudio Baeza Retamal <span dir="ltr"><<a href="mailto:claudio@dim.uchile.cl">claudio@dim.uchile.cl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Hi,<br>

<br>

<br>

El 14-03-2011 22:05, Andreas Dilger escribió:<br>

<div class="im">> On 2011-01-14, at 3:57 PM, Claudio Baeza Retamal wrote:<br>

>> last month, I have configured lustre 1.8.5 over infiniband, before, I<br>

>> was using  Gluster 3.1.2, performance was ok but reliability was wrong,<br>

>> when 40 or more applications requested at the same time  for open a<br>

>> file, gluster servers  bounced randomly the active connections from<br>

>> clients. Lustre has not this problem, but I can see others issues, for<br>

>> example, namd appears with "system cpu"  around of 30%,  hpl benchmark<br>

>> appears between  70%-80% of "system cpu", is too much high, with<br>

>> gluster, the system cpu was never exceeded 5%. I think, this is<br>

>> explained due gluster uses fuse and run in user space, but I am do not<br>

>> sure.<br>

> If Gluster is using FUSE, then all of the CPU usage would appear in "user" and not "system".  That doesn't mean that the CPU usage is gone, just accounted in a different place.<br>

><br>

><br>

>> I have some doubt:<br>

>><br>

>> ¿why Lustre uses ipoib? Before, with gluster  I do not use ipoib, I am<br>

>> thinking  that ipoib module produces bad performance in infiniband and<br>

>> disturbs the infiniband native module.<br>

> If you are using IPoIB for data then your LNET is configured incorrectly.  IPoIB is only needed for IB hostname resolution, and all LNET traffic can use native IB with very low CPU overhead.  Your /etc/modprobe.conf and mount lines should be using {addr}@o2ib0 instead of {addr} or {addr}@tcp0.<br>


><br>

<br>

</div>For first two weeks, I was using "options lnet networks="o2ib(ib0)",<br>

now, I am using "options lnet networks="o2ib(ib0),tcp0(eth0)" because I<br>

have a node without HCA card, in both case, the system cpu usage is the<br>

same, the compute node without infiniband is used to run matlab only.<br>

<br>

In the hpl benchmark case, my doubt is, why has a high system cpu<br>

usage?   Is posible that LustreFS disturbs  mlx4 infiniband driver and<br>

causes problems with  MPI?  hpl benchmark mainly does I/O for transport<br>

data over MPI, with glusterFS system cpu was around 5%, instead, since<br>

Lustre  was configured system cpu is 70%-80% and we use  o2ib(ib0) for<br>

LNET in modprobe.conf .<br>

I have tried several options, following instruction from mellanox, in<br>

compute nodes I disable irqbalance and run smp_affinity script, but<br>

system cpu still so higher.<br>

Are there any tools to study lustre performance?<br>

<div class="im"><br>

>> It is posible to configure lustre to  transport metada over ethernet and<br>

>> data over infiniband?<br>

> Yes, this should be possible, but putting the metadata on IB is much lower latency and higher performance so you should really try to use IB for both.<br>

><br>

>> For namd and hpl benchmark, is  it normal to have system cpu to be so high?<br>

>><br>

>> My configuration is the following:<br>

>><br>

>> - Qlogic 12800-180 switch, 7 leaf (24 ports per  leaf) and 2 spines (All<br>

>> ports have QDR, 40 Gbps)<br>

>> - 66 HCA mellanox connectX, two ports, QDR 40 Gbps (compute nodes)<br>

>> - 1 metadata server, 96 GB RAM DDR3 optimized for performance, two Xeon<br>

>> 5570, SAS 15K RPM  hard disk in Raid 1, HCA mellanox connectX with two ports<br>

>> - 4 OSS with 1 OST of 2 TB in RAID 5 each one (8 TB in total). The all<br>

>> OSS have a Mellanox ConnectX with two ports<br>

> If you have IB on the MDS then you should definitely use {addr}@o2ib0 for both OSS and MDS nodes.  That will give you much better metadata performance.<br>

><br>

> Cheers, Andreas<br>

> --<br>

> Andreas Dilger<br>

> Principal Engineer<br>

> Whamcloud, Inc.<br>

><br>

><br>

><br>

><br>

><br>

<br>

</div>regards<br>

<font color="#888888"><br>

claudio<br>

</font><div><div></div><div class="h5"><br>

<br>

_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

</div></div></blockquote></div><br>