[lustre-discuss] OSS Nodes crashing (and an MDS crash as well)

Sid Young sid.young at gmail.com
Tue Mar 2 15:27:49 PST 2021


G'Day all,

As I reported in a previous email my OSS nodes crash soon after initiating
a file creation script using "dd" in a loop and then trying to delete all
the files at once.

At first I thought it was related to the Melanox 100G cards but after
rebuilding everything using just the 10G network I still get the crashes. I
have a crash dump file from the MDS which crashed during the creates and
the OSS crashed when I did the deletes.

This leads me to think Lustre 2.12.6 running on Centos 7.9 has a subtle bug
somewhere?

I'm not sure how to progress this, should I attempt to try 2.13?
https://downloads.whamcloud.com/public/lustre/lustre-2.13.0/el7/patchless-ldiskfs-server/RPMS/x86_64/

Or build a fresh instance on a clean build of the OS?

Thoughts?


Sid Young
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20210303/f55b806a/attachment.html>


More information about the lustre-discuss mailing list