[lustre-devel] [LSF/MM/BPF TOPIC] [DRAFT] Lustre client upstreaming

Mon Feb 3 11:42:36 PST 2025

On Mon, 2025-02-03 at 17:24 +0000, Day, Timothy wrote:
> > If you want to avoid surprises for your patches, I can publish my
> > boilpot scripts and you can run your own instance if you have the
> > hardware.
> > Or we can find some sponsors to have some sort of a shared public
> > instance where people could drop their patches to?
> 
> If you could publish the boilpot scripts, I think that'd be super
> helpful.
> It'd be a lot easier to understand how to reproduce these failures.
> Plus, writing the orchestration to run it in the cloud would be
> straightforward, I think.

Unfortunately cloud is not very conductive to the way boilpot operates,
the whole idea is to instantiate a gazillion of virtual machines that
are run on a single physical host to overcommit the cpu (a lot!)

so I have this 2T RAM AMD box and I instantiate 240 virtual machines on
it, each gets 15G RAM and 15 CPU cores (this is the important part, if
you do not have cpu overcommit, nothing works)

inside based on the node id (hostnames are just numbered for
simplicity)one of several scripts is run:

LOCALNUM=$(basename $(hostname) .localnet | sed 's/^centos-//')

if [ $LOCALNUM -eq 300 ] ; then # impossible to hit
	FSTYPE=zfs
	MDSSIZE=600000
	MDSCOUNT=3
	OSTCOUNT=4

	export FSTYPE
	export MDSSIZE
	export MDSCOUNT
	export OSTCOUNT

	export ONLY=300
#	exec /etc/rc.d/tests-sanity
exit
fi

FSTYPE=ldiskfs
MDSSIZE=400000
MDSCOUNT=1
OSTCOUNT=4
# 50% probability - ZFS
test $((RANDOM % 2)) -eq 0 && FSTYPE=zfs MDSSIZE=600000

# 33% probability - DNE
test $((RANDOM % 3)) -eq 0 && MDSCOUNT=3

export FSTYPE
export MDSSIZE
export MDSCOUNT
export OSTCOUNT

#if [ $LOCALNUM -eq 100 ] ; then
#	exec /etc/rc.d/zfs-only-mount
#fi

case $((LOCALNUM % 5)) in

0) exec /etc/rc.d/tests-racer $LOCALNUM ;;
1) exec /etc/rc.d/tests-replay $LOCALNUM ;;
2) exec /etc/rc.d/tests-recovery $LOCALNUM ;;
3) exec /etc/rc.d/tests-sanity $LOCALNUM ;;
4) exec /etc/rc.d/tests-confsanity $LOCALNUM ;;

esac

and then each tests-* does what it says.

They all begin the same:
#!/bin/bash

. /etc/rc.d/tests-config
TESTDIR=${TESTDIR:-"/home/green/git/lustre-release/lustre/tests"}
cd "$TESTDIR"
while [ ! -e ../utils/mount.lustre ] ; do sleep 10 ; done

bash /etc/rc.d/tests-common &

and then split as below:

screen -d -m bash -c 'while :; do rm -rf /tmp/* ; TIMEST=$(date +'%s')
; SLOW=yes REFORMAT=yes DURATION=$((900*3)) PTLDEBUG="vfstrace rpctrace
dlmtrace neterror ha config ioctl super cache" DEBUG_SIZE=100 bash
racer.sh ; TIMEEN=$(date +'%s') ; if [ $((TIMEEN - TIMEST)) -le 60 ] ;
then echo Cycling too fast > /dev/kmsg ; echo c >/proc/sysrq-trigger ;
fi ; sh llmountcleanup.sh ; done'

screen -d -m bash -c 'while :; do rm -rf /tmp/* ; TIMEST=$(date +'%s')
; EXCEPT="51f 60a 101 200l 300k" SLOW=yes REFORMAT=yes bash sanity.sh ;
TIMEEN=$(date +'%s') ; if [ $((TIMEEN - TIMEST)) -le 60 ] ; then echo
Cycling too fast > /dev/kmsg ; echo c >/proc/sysrq-trigger ; fi ; bash
llmountcleanup.sh ; rm -rf /tmp/* ; SLOW=yes REFORMAT=yes bash
sanityn.sh ; bash llmountcleanup.sh ; SLOW=yes REFORMAT=yes bash
sanity-pfl.sh ; bash llmountcleanup.sh ; SLOW=yes REFORMAT=yes bash
sanity-flr.sh ; bash llmountcleanup.sh ; SLOW=yes REFORMAT=yes bash
sanity-dom.sh ; bash llmountcleanup.sh ; done'

screen -d -m bash -c 'while :; do rm -rf /tmp/* ; TIMEST=$(date +'%s')
; EXCEPT="32 36 67 76 78 102 69 106" SLOW=yes REFORMAT=yes bash conf-
sanity.sh ; TIMEEN=$(date +'%s') ; if [ $((TIMEEN - TIMEST)) -le 60 ] ;
then echo Cycling too fast > /dev/kmsg ; echo c >/proc/sysrq-trigger ;
fi ; bash llmountcleanup.sh ; for i in `seq 0 7` ; do losetup -d
/dev/loop$i ; done ; done'

screen -d -m bash -c 'while :; do rm -rf /tmp/* ; TIMEST=$(date +'%s')
; EXCEPT=101 SLOW=yes REFORMAT=yes bash recovery-small.sh ;
TIMEEN=$(date +'%s') ; if [ $((TIMEEN - TIMEST)) -le 60 ] ; then echo
Cycling too fast > /dev/kmsg ; echo c >/proc/sysrq-trigger ; fi ; bash
llmountcleanup.sh ; done'

screen -d -m bash -c 'while :; do rm -rf /tmp/* ; TIMEST=$(date +'%s')
; SLOW=yes REFORMAT=yes bash replay-single.sh ; TIMEEN=$(date +'%s') ;
if [ $((TIMEEN - TIMEST)) -le 60 ] ; then echo Cycling too fast >
/dev/kmsg ; echo c >/proc/sysrq-trigger ; fi ; bash llmountcleanup.sh ;
SLOW=yes REFORMAT=yes bash replay-ost-single.sh ; bash
llmountcleanup.sh ; SLOW=yes REFORMAT=yes bash replay-dual.sh ; bash
llmountcleanup.sh ; done'

The common scaffolding is to just catch stuck tests that have no
progress for too long:
# Seconds
TMOUT=3600
TMOUT_SHORT=2400 # 40 minutes - for ldiskfs
TMOUT_LONG=3600 # 60 minutes - for zfs
WDFILE=/tmp/watchdog.file
TOUTFILE=/tmp/test_output_file_rnd

# Initial rampup
sleep 10

while :; do
	touch ${WDFILE}
	sleep ${TMOUT}

	if [ -e ${WDFILE} ] ; then
		# Just a long test? Give it another try
		dmesg | grep 'DEBUG MARKER: ==' | tail -1 >
${TOUTFILE}_1
		if [ $FSTYPE = zfs ] ; then
			sleep ${TMOUT_LONG}
		else
			sleep ${TMOUT_SHORT}
		fi

		if [ -e ${TOUTFILE}_1 ] ; then
			dmesg | grep 'DEBUG MARKER: ==' | tail -1 >
${TOUTFILE}_2

			# If no subtest changed - force crash
			if cmp ${TOUTFILE}_1 ${TOUTFILE}_2 ; then

				# extra zfs debug
				if [ $FSTYPE = zfs ] ; then
					(echo "zpool stats on hang" ;
zpool iostat 1 10 ) >/dev/kmsg 2>&1
				fi

				# and crash
				echo c >/proc/sysrq-trigger
			fi

			# We only get here if the test was different
			# Since the progress is there - just keep
monitoring
		fi
	fi
done