[lustre-discuss] Lustre errors asking for help

Baranowski, Roman roman.baranowski at ubc.ca
Wed Jan 17 16:55:30 PST 2024


Dear All,

We have a legacy version of Lustre installed as part of a DDN storage solution:

lustre: 2.4.3 (circa 2011)

kernel: patchless_client

Build Version: EXAScaler-ddn1.0--PRISTINE-2.6.32-358.23.2.el6_lustre.es279.devel.x86_64



It has been running fine for years but after a particularly bad power failure,it started producing the following messages:

Jan 15 10:03:07 mds2 kernel: : LustreError: 3394:0:(osp_precreate.c:989:osp_precreate_thread()) scratch-OST0014-osc-MDT0000: cannot precreate objects: rc = -116
Jan 15 10:03:07 mds2 kernel: : LustreError: 3394:0:(osp_precreate.c:989:osp_precreate_thread()) Skipped 210 previous similar messages
Jan 15 10:07:51 mds2 kernel: : Lustre: scratch-OST000f-osc-MDT0000: slow creates, last=[0x1000f0000:0x1217571a:0x0], next=[0x1000f0000:0x1217571a:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0
Jan 15 10:07:51 mds2 kernel: : Lustre: Skipped 3 previous similar messages
Jan 15 10:08:32 oss5 kernel: : LustreError: 26943:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0004: unable to precreate: rc = -116
Jan 15 10:08:32 oss5 kernel: : LustreError: 26943:0:(ofd_obd.c:1348:ofd_create()) Skipped 66 previous similar messages
Jan 15 10:09:26 oss4 kernel: : LustreError: 18223:0:(ofd_obd.c:1348:ofd_create()) scratch-OST000f: unable to precreate: rc = -116
Jan 15 10:09:26 oss4 kernel: : LustreError: 18223:0:(ofd_obd.c:1348:ofd_create()) Skipped 70 previous similar messages
Jan 15 10:09:37 oss3 kernel: : LustreError: 16621:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0014: unable to precreate: rc = -116
Jan 15 10:09:37 oss3 kernel: : LustreError: 16621:0:(ofd_obd.c:1348:ofd_create()) Skipped 77 previous similar messages
Jan 15 10:09:38 mds2 kernel: : Lustre: scratch-OST0014-osc-MDT0000: slow creates, last=[0x100140000:0x11dd257a:0x0], next=[0x100140000:0x11dd257a:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=-116
Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:484:osp_precreate_send()) scratch-OST0004-osc-MDT0000: can't precreate: rc = -116
Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:484:osp_precreate_send()) Skipped 226 previous similar messages
Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:484:osp_precreate_send()) Skipped 226 previous similar messages
Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:989:osp_precreate_thread()) scratch-OST0004-osc-MDT0000: cannot precreate objects: rc = -116
Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:989:osp_precreate_thread()) Skipped 226 previous similar messages
Jan 15 10:18:37 oss5 kernel: : LustreError: 1791:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0004: unable to precreate: rc = -116
Jan 15 10:18:37 oss5 kernel: : LustreError: 1791:0:(ofd_obd.c:1348:ofd_create()) Skipped 77 previous similar messages
Jan 15 10:19:36 oss4 kernel: : LustreError: 1687:0:(ofd_obd.c:1348:ofd_create()) scratch-OST000f: unable to precreate: rc = -116
Jan 15 10:19:36 oss4 kernel: : LustreError: 1687:0:(ofd_obd.c:1348:ofd_create()) Skipped 77 previous similar messages
Jan 15 10:19:42 oss3 kernel: : LustreError: 1196:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0014: unable to precreate: rc = -116
Jan 15 10:19:42 oss3 kernel: : LustreError: 1196:0:(ofd_obd.c:1348:ofd_create()) Skipped 75 previous similar messages
Jan 15 10:23:16 mds2 kernel: : LustreError: 3400:0:(osp_precreate.c:484:osp_precreate_send()) scratch-OST000f-osc-MDT0000: can't precreate: rc = -116

The messages concern the same 3 OSTs and appear both on the OSS servers serving those OSTs and the mds server responsible for that filesystem (/global/scratch).
They appear continuously, about every 4 minutes, and appear as soon as the filesystem is mounted.... even before any I/O occurs.  In other words, even on an inactive filesystem, the messages appear continuously.

While everything seems to work, the performance is terrible.  Creating a directory on the filesystem can take 1-2 minutes to complete.  The load on the mds server climbs to incredibly high values (100-160) during normal I/O operations and the filesystem overall is extremely slow.  The mds server complains about slow connections (see messages above).

We think the error messages above indicate the problem but despite searching many hours on the web, have not been able to find any documentation about what may be causing them, or how to correct the issue.

Any help would be greatly appreciated. Thanks a million for any suggestions and solutions....

All the best
Roman


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20240118/a38f6258/attachment.htm>


More information about the lustre-discuss mailing list