[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Day, Timothy timday at amazon.com
Thu Feb 6 10:24:24 PST 2025


> don't really need memory overcommit (in fact it's somewhat
> counterproductive), but since VMs typically don't use all RAM I wing it
> and run somewhat more VMs than what memory permits.
>
> as for CPU - the more overcommit is the better (my box has 96 cores).

Thanks for the pointers. Not sure which instance type I'd use, but
it's easy enough to try a bunch and see what works best.

> if this is to be deployed in the cloud at will, some robust
> orchestration is needed host-side - I create 240 libvirt driven VMs
> with their own storage in LVM, dhcp-driven autoconf, nfs export host-
> side with the right distro - just once per box lifetime and compiled
> lustre every time I run testing (so a resh checkout of master-next
> usually).
> Then configure crashdumping and an inotifywatch-based script to catch
> cores and do some light processign and ship results to the central data
> collector. (might be more efficient to do using in-vm crashdumping
> instead?)

I wrote a parallel ktest runner [1] a while back that probably does
the needed orchestration on the host side. It was originally intended
to run sanity tests faster (mostly for the OSD stuff I was working on).
But I think it could be adapted to run boilpot without much work.
It'd probably need some daemonize mode and I'd need to validate
that ktest actually captures all of the error modes we care about.

Ideally, the boilpot part would be platform agnostic. The cloud
orchestration part would just create the VM, run boilpot, and shuffle
the crash dumps off the box. My main goal (right now) is to get
something easily reproducible and get a sense of the signal/noise
ratio on boilpot. Plus, it might be interesting to try and flush out bugs
in my OSD as well [2]. It's hard to say how often I'd run it without
first seeing how effective it is.

Tim Day

[1] https://github.com/tim-day-387/ktest/tree/pktest
[2] https://review.whamcloud.com/c/fs/lustre-release/+/55594




More information about the lustre-devel mailing list