[Lustre-discuss] Performance problems with Lustre 1.6.1

Mon Oct 1 15:56:33 PDT 2007

Hi all,

I have set up a small Lustre file system with 1 MDS and 8 OSS/OST. The 
particularity of our system is that every OSS is also a client of the 
file system (there are 8 clients altogether).

The file system has a 1 GB file striped across all the OSTs. On every 
OST, there is a process which reads the file chunks stored locally, 
e.g., in its own OST (since the processes have the striping information 
of the file, each one knows which portions of the file are stored in its 
OST).

The problem that I have is that, when the stripe size is 1MB (what means 
that there are 1024 chunks in total, or 128 chunks per OST), it takes 
more than 400 seconds to read the file, and the network traffic is very 
high. However, if the stripe size is 128 MB (8 chunks altogether, one 
per OST), it takes only around 100 seconds to read the file, and the 
network traffic is 1/10th the previous one. Note that, in both cases, 
the data I/O operations are local and that the processes read the same 
amount of data.

Could this be a problem with the lock mechanism and the caching on the 
clients? If so, I have seen that the ldlm can be disabled, but, how? 
(The processes read from disjoint parts of the file, so they do not 
really need the ldlm service).

Thanks in advance,

    Juan.