[lustre-discuss] questions about group locks / LDLM_FL_NO_TIMEOUT flag

Bertschinger, Thomas Andrew Hjorth bertschinger at lanl.gov
Wed Aug 30 06:40:19 PDT 2023


Hello, 

We have a few files created by a particular application where reads to those files consistently hang. The debug log on a client attempting a read() has messages like:

> ldlm_completion_ast(): waiting indefinitely because of NO_TIMEOUT ...

This is printed when the flag LDLM_FL_NO_TIMEOUT is true, and code comments above that flag imply that it is set for group locks. So, we've been trying to identify if the application in question uses group locks. (I have reached out to the app's developers but do not have a response yet.)

If I open the file with O_NONBLOCK, any reads immediately return with error 11 / EWOULDBLOCK. This behavior is documented to occur for Lustre group locks.

However, I would like to clarify whether the LDLM_FL_NO_TIMEOUT flag is true *only* when a group lock is held, or are there other circumstances where the behavior described above could occur?

If this is caused by a group lock is there an easy way to tell from server side logs or data what client(s) have the group lock and are blocking access? The motivation is that we believe any jobs accessing these files have long since been killed, and no nodes from the job are expected to be holding the files open. We would like to confirm or rule out that possibility by easily identifying any such clients.

Advice on how to effectively debug ldlm issues could be useful beyond just this issue. In general, if there is a reliable way to start from a log entry for a lock like 

> ... ns: lustre-OST0000-osc-ffff9a0942c79800 lock: 000000003f3a5950/0xe54ca8d2d7b66d03 lrc: 4/1,0 mode: --/PR  ...

and get information about the client(s) holding that lock and any contending locks, that would be helpful in debugging situations like this.

server: 2.15.2
client that application ran on: 2.15.0.4_rc2_cray_172_ge66844d
client that I tested file access from: 2.15.2

Thanks!

- Thomas Bertschinger


More information about the lustre-discuss mailing list