<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    Hi Thomas,<br>
    <br>
    It sounds like you are running into this issue: <br>
    <a class="moz-txt-link-freetext" href="https://jira.whamcloud.com/browse/LU-14121">https://jira.whamcloud.com/browse/LU-14121</a><br>
    <br>
    I think I ran into the same issue as you or at least something
    similar on our slurm cluster using Lustre 2.15.x (servers and
    clients).<br>
    As I haven't had the spare cycles or equipment to dig into what was
    going on, I have been using admin=1 and the legacy root squash
    mechanism for our cluster nodes as mentioned in the jira ticket.<br>
    <br>
    <br>
    Thanks,<br>
    David<br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 1/9/2025 12:58 PM, Thomas Roth
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:92873aba-20ee-4b1a-89ae-9ae0e3f61760@googlemail.com">Ja
      ja,
      <br>
      I have an Admin nodemap comprising all Lustre servers and a
      handful of administrative clients, and this nodemap has both admin
      and trusted set to 1.
      <br>
      <br>
      No, by now I rather think that because the Slurm demon,
      slurmstepd, is running as root, it comes in as user 99 on the
      batch nodes, and when the job wants to write output to, say,
      /lustre/A/B/C/, and A,B,C are not world-readable (actually octal
      '5'), slurmstepd can't step into the output directory and the job
      will fail.
      <br>
      <br>
      <br>
      Regards,
      <br>
      Thomas
      <br>
      <br>
      On 1/9/25 1:10 PM, Sebastien Buisson wrote:
      <br>
      <blockquote type="cite">Hi,
        <br>
        <br>
        As explained in the Lustre Operations Manual in this section:
        <br>
<a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__https://doc.lustre.org/lustre_manual.xhtml*idm139831573757696__;Iw!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0NZ8acRKQ$">https://urldefense.com/v3/__https://doc.lustre.org/lustre_manual.xhtml*idm139831573757696__;Iw!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0NZ8acRKQ$</a>
        it is required to define a nodemap that matches all server
        nodes, with admin and trusted to 1.
        <br>
        Have you?
        <br>
        <br>
        Cheers,
        <br>
        Sebastien.
        <br>
        <br>
        Le 9 janv. 2025 à 13:03, Thomas Roth
        <a class="moz-txt-link-rfc2396E" href="mailto:dibbegucker@googlemail.com"><dibbegucker@googlemail.com></a> a écrit :
        <br>
        <br>
        [Vous ne recevez pas souvent de courriers de
        <a class="moz-txt-link-abbreviated" href="mailto:dibbegucker@googlemail.com">dibbegucker@googlemail.com</a>. D?couvrez pourquoi ceci est
        important ?
<a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0O_RiJ-1g$">https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0O_RiJ-1g$</a> 
        ]
        <br>
        <br>
        Hi all,
        <br>
        <br>
        we have just switched on nodemap on our 2.12 cluster, with all
        batch clients being trusted=1 but admin=0, so bascially
        root-squashing.
        <br>
        <br>
        The batch system is done by Slurm.
        <br>
        <br>
        Now all jobs are failing, when the user's directory on Lustre is
        not world-readable ("permission denied").
        <br>
        <br>
        RW - Access in the shell is not a problem.
        <br>
        <br>
        <br>
        <br>
        Any site running Slurm and having encountered a similar issue?
        <br>
        <br>
        <br>
        Regards,
        <br>
        Thomas
        <br>
        <br>
        <br>
        Perhaps I should add that I have used the default nodemap for
        this, to avoid having to specify many hundreds of non-contiguous
        batch node IP ranges.
        <br>
        _______________________________________________
        <br>
        lustre-discuss mailing list
        <br>
        <a class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>
        <br>
<a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0MB_ZRRXw$">https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0MB_ZRRXw$</a>
        <br>
      </blockquote>
      <br>
      _______________________________________________
      <br>
      lustre-discuss mailing list
      <br>
      <a class="moz-txt-link-abbreviated" href="mailto:lustre-discuss@lists.lustre.org">lustre-discuss@lists.lustre.org</a>
      <br>
<a class="moz-txt-link-freetext" href="https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0MB_ZRRXw$">https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0MB_ZRRXw$</a>
    </blockquote>
    <br>
  </body>
</html>