[lustre-discuss] Jobstats harvesting

Andreas Dilger adilger at whamcloud.com
Mon Feb 17 02:06:35 PST 2020


You don't mention which Lustre release you are using, but newer
releases allow "complex JobIDs" that can contain both the SLURMJobID
as well as other constant strings (e.g. cluster name), hostname, UID, GID, and process name.

This is documented in the Lustre manual at:
http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.jobstats

Cheers, Andreas

On Feb 14, 2020, at 19:13, Andrew Elwell <andrew.elwell at gmail.com> wrote:

Hi folks,

I've finally got round to enabling jobstats on a test system. As we're
a Slurm shop, setting this to jobid_var=SLURM_JOB_ID works OK, but is
it possible to use a combination of variables?
ie ${PAWSEY_CLUSTER}-${SLURM_JOB_ID} (or even SLURM_CLUSTER_NAME which
is the same as $PAWSEY_CLUSTER)? if so, what's the syntax? (Yes, I
know that setting it to federated would jump up the JobId namespace to
include a cluster identifier, but that's not happening for now.

However, main reason for mail is to find out what people use to
harvest the stats off the MDT/OSTs - I'm aware of Roland Laifer's
LAD15 presentation (sadly his tarball misses a sample config file out,
so it's taken me a bit of iteration over the Perl scripts to recreate
syntax) which saves to a file based structure, and I've seen others
using Prometheus (via https://grafana.com/grafana/dashboards/9671)

We've got influxdb (lnet / mds / ost stats gathered as well as regular
collectd output) and mariaDB (slurmdbd and robinhood) DBs available,
so I'd rather go with something that fed into that.
We're not doing serious high throughput (financial style) but more
traditional HPC with a lot (sigh) of single node jobs over 4
production filesystems (of which 3 are non-appliance LTS releases
maintained by us)

Hopefully the discussion here will lead to some updated content at
http://wiki.lustre.org/Lustre_Monitoring_and_Statistics_Guide (hat tip
to Scott for a great start)

Many thanks

Andrew
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200217/4723da13/attachment.html>


More information about the lustre-discuss mailing list