[Lustre-discuss] Lustre and system cpu usage

Mon Mar 14 17:57:25 PDT 2011

Dears folks,

last month, I have configured lustre 1.8.5 over infiniband, before, I 
was using  Gluster 3.1.2, performance was ok but reliability was wrong, 
when 40 or more applications requested at the same time  for open a 
file, gluster servers  bounced randomly the active connections from 
clients. Lustre has not this problem, but I can see others issues, for 
example, namd appears with "system cpu"  around of 30%,  hpl benchmark  
appears between  70%-80% of "system cpu", is too much high, with 
gluster, the system cpu was never exceeded 5%. I think, this is 
explained due gluster uses fuse and run in user space, but I am do not 
sure. I have some doubt:

¿why Lustre uses ipoib? Before, with gluster  I do not use ipoib, I am 
thinking  that ipoib module produces bad performance in infiniband and 
disturbs the infiniband native module.
It is posible to configure lustre to  transport metada over ethernet and 
data over infiniband?

For namd and hpl benchmark, is  it normal to have system cpu to be so high?

My configuration is the following:

- Qlogic 12800-180 switch, 7 leaf (24 ports per  leaf) and 2 spines (All 
ports have QDR, 40 Gbps)
- 66 HCA mellanox connectX, two ports, QDR 40 Gbps (compute nodes)
- 1 metadata server, 96 GB RAM DDR3 optimized for performance, two Xeon 
5570, SAS 15K RPM  hard disk in Raid 1, HCA mellanox connectX with two ports
- 4 OSS with 1 OST of 2 TB in RAID 5 each one (8 TB in total). The all 
OSS have a Mellanox ConnectX with two ports

I will appreciate any help or tips.

Regards

claudio

-- 
Claudio Baeza Retamal
CTO
National Laboratory for High Performance Computing (NLHPC)
Center for Mathematical Modeling (CMM)
School of Engineering and Sciences
Universidad de Chile