[lustre-discuss] Invalid jobid size

Andreas Dilger adilger at whamcloud.com
Fri Aug 12 16:16:04 PDT 2022


Michael,
I wasn't even aware of this behavior of falling back to jobid_name if $jobid_var is unset.  Could you please file a ticket in Jira LUDOC about this, and ideally submit a patch to explain this in the manual.

Cheers, Andreas

On Aug 12, 2022, at 16:26, Sternberg, Michael G. via lustre-discuss <lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>> wrote:

Einar,

The strings in your $SLURM_JOB_ID values or host names are likely too long to serve as jobid for the Lustre Jobstats feature .

You might try %H instead of %h in jobid_name. For reference, from the Lustre manual, https://doc.lustre.org/lustre_manual.xhtml#jobstats :

%e print executable name
%g print group ID number
%h print fully-qualified hostname
%H print short hostname
%j print JobID from process environment variable named by the jobid_var parameter
%p print numeric process ID
%u print user ID number


On my system (2.12), I use:

jobid_var=PBS_JOBID
jobid_name=%e.%u

I get job_stats by $PBS_JOBID, as expected, from processes that actually have the variable set, and synthetic %e.%u values from all others, like processes on interactive or backup nodes. This has been working just fine to pinpoint the source of occasional trouble.

Curiously, I don't think the manual spells out what happens when the variable referenced by jobid_var is unset, i.e., the above fallback logic from jobid_var to jobid_name.


With best regards,
--
Michael Sternberg, Ph.D.
Principal Scientific Computing Administrator
Center for Nanoscale Materials
Argonne National Laboratory




On Aug 12, 2022, at 03:37, Einar Næss Jensen <einar.nass.jensen at ntnu.no<mailto:einar.nass.jensen at ntnu.no>> wrote:
logfiles on oss servers are full of these error messages:
Invalid jobid size (37), expect(32)
What does it mean?

we have set this:
[root at mds-1 ~]# lctl get_param jobid_var jobid_name
jobid_var=SLURM_JOB_ID
jobid_name=%j:%u:%h

lustre version is 2.12.6(ddn)
_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220812/a5084928/attachment-0001.htm>


More information about the lustre-discuss mailing list