[lustre-discuss] From ceph to lustre. what is the right setup for high availability in a small cluster?

Arvid Picciani aep at exys.org
Sun Mar 26 01:22:34 PDT 2023


Hello list.

we have a small ceph setup with 100TB over 7 nodes.
Ceph seemed like an ideal match for our requirements
 - Disk backend for a large amount of small VMs
 - Shared file systems through virtiofsd
 - Recovers from failing nodes unattended

However, over the years it became more clear that it's not the right
match because:
 - performance is incredibly poor (40MB/s write over 50G ethernet)
 - a lot of failures cascade to the whole cluster, which actually
means we have MORE downtime than just using local storage

So i'm looking at alternatives.

Initially i ruled out LUSTRE because it seemed for much larger
companies doing some proper science stuff. We're a tiny cloud hoster
with tiny budget and tiny physical space.

Searching for how to achieve high availability with lustre, i found
most people appear to use things like dual port SAS drives which can
be accessed from multiple nodes. But i dont think thats an option for
us since we're very space constrained (those 4U JBODs will not fit)
and already have 100TB of sata and nvme SSDs directly attached to the
compute nodes.

But then there is this idea
https://wiki.lustre.org/MDT_Mirroring_with_ZFS_and_SRP
which moves the problem to ZFS. They just make all drives available
via iSER and mirror data using ZFS. This makes me wonder what lustre
even adds in this scenario, since zfs is already doing the heavy
lifting of managing replication and high availability.

If i look at https://wiki.lustre.org/images/6/64/LustreArchitecture-v4.pdf
my initial conclusion was that lustre works just like ceph: there's a
bunch of object storage servers, and the management daemons make sure
the data is stored in multiple of them. So why does it even need
multiple hosts accessing the same disk?

Or is it perhaps that only object _data_ replication is managed by
lustre while object metadata is not redundant?

Exposing the directly attached sata drivers over iSER appears to be
the only solution for us, but i'm very worried that going against the
recommended setup will invite all sorts of trouble down the road.

Does anyone else do this? Should we stay away from it?

Thanks
Arvid


More information about the lustre-discuss mailing list