[lustre-discuss] Failing New File Creations on Lustre

Iannetti, Gabriele G.Iannetti at gsi.de
Tue Feb 28 03:54:39 PST 2023


Dear Lustre Community,

from time to time we have the situation, 
that on combinations of MDT+OST no new file can be created on Lustre.

We can observe that, if at least a MDS is showing up high load and suspicious errors in the kernel logs.

We would like to question if that is a known bug/error?

For the recently happened crash of a file server that also impacted two metadata server 
that then showed up the described problem, we can provide kernel and also Lustre logs for the MDS 
for debugging purposes.

In a more detail:

Shortly before the crash of that OSS, new file creations on its OSTs in combination with two MDS started failing...

During the repair of the OSS we set max_create_count=0 and return to max_create_count=20000 after successful recovery. 
No further errors are observed in either MDS or OSS logs, but in that case MDS1 does not create any new files and 
MDS2 creates new files on few OSTs of that repaired OSS (but not all are failing as with MDS1).

Usually our workaround is to reactivate that particular OSTs on the MDS via `lctl set_param osp.lustrefs-${OST}-osc-*.active=(from 0 to 1)`.

Currently on all MDS the following is set:  

Checking max_create_count on all MDS:

* max_create_count=20000
* to all OSTs active=1

And still we got the following list of failing file creations on Lustre:

MDT0-OST[259-265]
MDT1-OST[260-262]

We are running Lustre 2.12.5.

Any help would be appreciated!


Best
Gabriele


More information about the lustre-discuss mailing list