[lustre-devel] Proposal for JobID caching

Oleg Drokin oleg.drokin at intel.com
Thu Jan 19 08:28:55 PST 2017


On Jan 19, 2017, at 10:19 AM, Ben Evans wrote:

> 
> 
> On 1/18/17, 5:56 PM, "Oleg Drokin" <oleg.drokin at intel.com> wrote:
> 
>> 
>> On Jan 18, 2017, at 5:35 PM, Ben Evans wrote:
>> 
>>>> But if you do really run different jobs in the global namespace, we
>>>> probably can
>>>> probably just make the lctl to spawn a shell with commands that all
>>>> would
>>>> be marked as a particular job? Or we can probably trace the parent of
>>>> lctl and
>>>> mark that so that all its children become somehow marked too.
>>> 
>>> One of the things that came up during this is how do you handle a random
>>> user who logs into a compute node and runs something like rsync?  The
>>> more
>> 
>> Current scheme does not handle it either, unles you use nodelocal and
>> then their
>> actions would attribute to the job currently running (not super ideal as
>> well),
>> I imagine there's a legitimate reason for users to log into the nodes
>> running
>> unrelated jobs?
> 
> The current scheme does handle it, if you use the procname_uid setting.

But then that's the only thing it handles, you don't get the actual jobid
this way.
What you are looking for is a fallback if a command is not actually part
of any known job, I though, otherwise use the jobid that was somehow
detected.
Or were you thinking of somehow mapping this on management nodes then
from node+pid into known jobs?

>>> conditions we place around getting jobstats to function properly, the
>>> harder these types of behaviors are to track down.  One thing I was
>>> thinking was that if jobstats is enabled, that the fallback if no JobID
>>> can be found is to simply use the taskname_uid method, so an admin would
>>> see rsync.1234 pop up on your monitoring dashboard.
>> 
>> If you have every node into its own container, then the global namespace
>> could
>> be set to "unscheduledcommand-$hostname" or some such and every container
>> would get its own jobid.
> 
> or simply default to the existing procname_uid setting.

Yes, that too.



More information about the lustre-devel mailing list