[lustre-discuss] Lustre tuning - help

Pinkesh Valdria pinkesh.valdria at oracle.com
Fri Aug 9 12:11:09 PDT 2019

Lustre experts, 


I recently installed Lustre for the first time.  Its working (so I am happy),  but now I am trying to do some performance testing/tuning.   My goal is to run SAS workload and use Lustre as the shared file system for SAS Grid.    Later,  do tuning of Lustre for generic HPC workload.   



Through Google search,  I read articles on Lustre and recommendation for tuning from LUG conference slides, etc 





I have results of IBM Spectrum Scale (GPFS) running on same hardware/software stack and based on Lustre tuning I have done,  I am not getting optimal performance.    My understanding was that Lustre can deliver better performance compare to GPFS, if tuned correctly.  

I have tried, changing the following:

Use Stripe count =1, 4, 8, 16, 24 , -1 (to stripe across all OSTs). And progressive file layout: lfs setstripe -E 256M -c 1 -E 4G -c 4 -E -1 -c -1 -S 4M  /mnt/mdt_bv

Use Stripe Size:  default (1M),  4M,  64K (since SAS apps uses this).


SAS Grid uses large-block, sequential IO patterns. (Block size: 64K, 128K, 256K,  - 64K is their preferred value).   


Question 1:  How should I tune the Stripe Count and Stripe Size  for the above.  Also should I use Progressive Stripe Layout? 



So appreciate, if I can get some feedback on tuning I have done and if its correct and if I am missing anything.  




It’s a cloud based solution – Oracle Cloud Infrastructure.   Installed Lustre using instructions on WhamCloud. 

All running CentOS 7.

MGS -1 node (shared with MDS), MDS -1 node, OSS -3 nodes.    All nodes are Baremetal machines (no VM) with  52 physical cores, 768GB RAM and have 2 NICs (2x25gbps ethernet, no dual bonding).   1 NIC is configured to connect to Block Storage disks.  2nd NiC is configured to talk to clients.   So LNET is configured with 2nd NIC.       Each OSS is connected to 10 Block Volume disk, 800GB each.   So 10 OSTs per OSS.   Total of 30 OSTs (21TB storage) .  Have 1 MDT (800GB) attached to MDS.   


Clients are 24 physical cores VMs,  320GB RAM, 1 NIC (24.6gbps).  Using 3 clients in the above setup.  



On all nodes (MDS/OSS/Clients): 



### OS Performance tuning



setenforce 0

echo "

*          hard   memlock           unlimited

*          soft    memlock           unlimited

" >> /etc/security/limits.conf


# The below applies for both compute and server nodes (storage)

cd /usr/lib/tuned/

cp -r throughput-performance/ sas-performance


echo "#

# tuned configuration




summary=Broadly applicable tuning that provides excellent performance across a variety of common server workloads


devices=!dm-*, !sda1, !sda2, !sda3











kernel.sched_min_granularity_ns = 10000000

kernel.sched_wakeup_granularity_ns = 15000000

vm.dirty_ratio = 30

vm.dirty_background_ratio = 10


" > sas-performance/tuned.conf


tuned-adm profile sas-performance


# Display active profile

tuned-adm active




All NICs are configured to use MTU – 9000


Block Volumes/Disks 

For all OSTs/MDT: 

cat /sys/block/$disk/queue/max_hw_sectors_kb 


echo “32767” > /sys/block/$disk/queue/max_sectors_kb ;

echo "192" > /sys/block/$disk/queue/nr_requests ;

echo "deadline" > /sys/block/$disk/queue/scheduler ;

echo "0" > /sys/block/$disk/queue/read_ahead_kb ;

echo "68" > /sys/block/$disk/device/timeout ;


Only OSTs: 

lctl set_param osd-ldiskfs.*.readcache_max_filesize=2M



Lustre clients:

lctl set_param osc.*.checksums=0

lctl set_param timeout=600

#lctl set_param ldlm_timeout=200  - This fails with below error 

#error: set_param: param_path 'ldlm_timeout': No such file or directory

lctl set_param ldlm_timeout=200

lctl set_param at_min=250

lctl set_param at_max=600

lctl set_param ldlm.namespaces.*.lru_size=128

lctl set_param osc.*.max_rpcs_in_flight=32

lctl set_param osc.*.max_dirty_mb=256

lctl set_param debug="+neterror"



# https://cpb-us-e1.wpmucdn.com/blogs.rice.edu/dist/0/2327/files/2014/03/Fragalla.pdf - says turn off checksum at network level

ethtool -K ens3 rx off tx off


Lustre mounted with  -o flock option

mount -t lustre -o flock ${mgs_ip}@tcp1:/$fsname $mount_point



Once again, appreciate any guidance or help you can provide or you can point me to docs, articles, which will be helpful for me. 




Pinkesh Valdria

Principal Solutions Architect – Big Data & HPC

Oracle Cloud Infrastructure – Seattle 




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190809/e0d29358/attachment-0001.html>

More information about the lustre-discuss mailing list