[lustre-discuss] Lustre errors asking for help

Andreas Dilger adilger at whamcloud.com
Wed Jan 17 21:23:14 PST 2024


Roman,
have you tried running e2fsck on the underlying device ("-fn" to start)?  It is usually best
to run with the latest version of e2fsprogs as it has most fixes.  

It is definitely strange that all OSTs are reporting errors at the same time, which makes me
wonder how the underlying hardware is holding up?  Can you log in to the controller and check
the RAID status?

The error might be coming from the Object Index on those OSTs.  However, this version is old
enough that I'm not sure if OI Scrub is even existed in that version.  Otherwise, it would be
possible to just remove the OI files and they would be recreated on the next mount.

The filesystem currently isn't able to create any new files on those OSTs, so that may also
be why the performance is lower.

After 12+ years, it might be time to update to newer storage?  In particular, such old HDDs
often fail after a significant power failure, so you might be running on the last legs, and
it's a good time to make a backup.  Given the age of the storage, I expect a modern HDD or
two would have enough capacity to backup the whole filesystem (even if not performing as
well), in case you don't have a chance to upgrade before it finally gives out.

Cheers, Andreas

> On Jan 17, 2024, at 17:55, Baranowski, Roman wrote:
> 
> 
> Dear All,
> 
> We have a legacy version of Lustre installed as part of a DDN storage solution:
> 
> lustre: 2.4.3 (circa 2011)
> 
> kernel: patchless_client
> 
> Build Version: EXAScaler-ddn1.0--PRISTINE-2.6.32-358.23.2.el6_lustre.es279.devel.x86_64
> 
> 
> 
> It has been running fine for years but after a particularly bad power failure,it started producing the following messages:
> 
> Jan 15 10:03:07 mds2 kernel: : LustreError: 3394:0:(osp_precreate.c:989:osp_precreate_thread()) scratch-OST0014-osc-MDT0000: cannot precreate objects: rc = -116
> Jan 15 10:03:07 mds2 kernel: : LustreError: 3394:0:(osp_precreate.c:989:osp_precreate_thread()) Skipped 210 previous similar messages
> Jan 15 10:07:51 mds2 kernel: : Lustre: scratch-OST000f-osc-MDT0000: slow creates, last=[0x1000f0000:0x1217571a:0x0], next=[0x1000f0000:0x1217571a:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0
> Jan 15 10:07:51 mds2 kernel: : Lustre: Skipped 3 previous similar messages
> Jan 15 10:08:32 oss5 kernel: : LustreError: 26943:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0004: unable to precreate: rc = -116
> Jan 15 10:08:32 oss5 kernel: : LustreError: 26943:0:(ofd_obd.c:1348:ofd_create()) Skipped 66 previous similar messages
> Jan 15 10:09:26 oss4 kernel: : LustreError: 18223:0:(ofd_obd.c:1348:ofd_create()) scratch-OST000f: unable to precreate: rc = -116
> Jan 15 10:09:26 oss4 kernel: : LustreError: 18223:0:(ofd_obd.c:1348:ofd_create()) Skipped 70 previous similar messages
> Jan 15 10:09:37 oss3 kernel: : LustreError: 16621:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0014: unable to precreate: rc = -116
> Jan 15 10:09:37 oss3 kernel: : LustreError: 16621:0:(ofd_obd.c:1348:ofd_create()) Skipped 77 previous similar messages
> Jan 15 10:09:38 mds2 kernel: : Lustre: scratch-OST0014-osc-MDT0000: slow creates, last=[0x100140000:0x11dd257a:0x0], next=[0x100140000:0x11dd257a:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=-116
> Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:484:osp_precreate_send()) scratch-OST0004-osc-MDT0000: can't precreate: rc = -116
> Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:484:osp_precreate_send()) Skipped 226 previous similar messages
> Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:484:osp_precreate_send()) Skipped 226 previous similar messages
> Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:989:osp_precreate_thread()) scratch-OST0004-osc-MDT0000: cannot precreate objects: rc = -116
> Jan 15 10:13:12 mds2 kernel: : LustreError: 3404:0:(osp_precreate.c:989:osp_precreate_thread()) Skipped 226 previous similar messages
> Jan 15 10:18:37 oss5 kernel: : LustreError: 1791:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0004: unable to precreate: rc = -116
> Jan 15 10:18:37 oss5 kernel: : LustreError: 1791:0:(ofd_obd.c:1348:ofd_create()) Skipped 77 previous similar messages
> Jan 15 10:19:36 oss4 kernel: : LustreError: 1687:0:(ofd_obd.c:1348:ofd_create()) scratch-OST000f: unable to precreate: rc = -116
> Jan 15 10:19:36 oss4 kernel: : LustreError: 1687:0:(ofd_obd.c:1348:ofd_create()) Skipped 77 previous similar messages
> Jan 15 10:19:42 oss3 kernel: : LustreError: 1196:0:(ofd_obd.c:1348:ofd_create()) scratch-OST0014: unable to precreate: rc = -116
> Jan 15 10:19:42 oss3 kernel: : LustreError: 1196:0:(ofd_obd.c:1348:ofd_create()) Skipped 75 previous similar messages
> Jan 15 10:23:16 mds2 kernel: : LustreError: 3400:0:(osp_precreate.c:484:osp_precreate_send()) scratch-OST000f-osc-MDT0000: can't precreate: rc = -116
> 
> The messages concern the same 3 OSTs and appear both on the OSS servers serving those OSTs and the mds server responsible for that filesystem (/global/scratch).
> They appear continuously, about every 4 minutes, and appear as soon as the filesystem is mounted.... even before any I/O occurs.  In other words, even on an inactive filesystem, the messages appear continuously.
> 
> While everything seems to work, the performance is terrible.  Creating a directory on the filesystem can take 1-2 minutes to complete.  The load on the mds server climbs to incredibly high values (100-160) during normal I/O operations and the filesystem overall is extremely slow.  The mds server complains about slow connections (see messages above).
> 
> We think the error messages above indicate the problem but despite searching many hours on the web, have not been able to find any documentation about what may be causing them, or how to correct the issue.
> 
> Any help would be greatly appreciated. Thanks a million for any suggestions and solutions....
> 
> All the best
> Roman
> 
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud









More information about the lustre-discuss mailing list