[lustre-discuss] nodemap + Slurm: directories have to be world-readable

David Schanzenbach davidls at hawaii.edu
Thu Jan 9 19:24:25 PST 2025


Hi Thomas,

It sounds like you are running into this issue:
https://jira.whamcloud.com/browse/LU-14121

I think I ran into the same issue as you or at least something similar 
on our slurm cluster using Lustre 2.15.x (servers and clients).
As I haven't had the spare cycles or equipment to dig into what was 
going on, I have been using admin=1 and the legacy root squash mechanism 
for our cluster nodes as mentioned in the jira ticket.


Thanks,
David


On 1/9/2025 12:58 PM, Thomas Roth wrote:
> Ja ja,
> I have an Admin nodemap comprising all Lustre servers and a handful of 
> administrative clients, and this nodemap has both admin and trusted 
> set to 1.
>
> No, by now I rather think that because the Slurm demon, slurmstepd, is 
> running as root, it comes in as user 99 on the batch nodes, and when 
> the job wants to write output to, say, /lustre/A/B/C/, and A,B,C are 
> not world-readable (actually octal '5'), slurmstepd can't step into 
> the output directory and the job will fail.
>
>
> Regards,
> Thomas
>
> On 1/9/25 1:10 PM, Sebastien Buisson wrote:
>> Hi,
>>
>> As explained in the Lustre Operations Manual in this section:
>> https://urldefense.com/v3/__https://doc.lustre.org/lustre_manual.xhtml*idm139831573757696__;Iw!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0NZ8acRKQ$ 
>> it is required to define a nodemap that matches all server nodes, 
>> with admin and trusted to 1.
>> Have you?
>>
>> Cheers,
>> Sebastien.
>>
>> Le 9 janv. 2025 à 13:03, Thomas Roth <dibbegucker at googlemail.com> a 
>> écrit :
>>
>> [Vous ne recevez pas souvent de courriers de 
>> dibbegucker at googlemail.com. D?couvrez pourquoi ceci est important ? 
>> https://urldefense.com/v3/__https://aka.ms/LearnAboutSenderIdentification__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0O_RiJ-1g$ 
>> ]
>>
>> Hi all,
>>
>> we have just switched on nodemap on our 2.12 cluster, with all batch 
>> clients being trusted=1 but admin=0, so bascially root-squashing.
>>
>> The batch system is done by Slurm.
>>
>> Now all jobs are failing, when the user's directory on Lustre is not 
>> world-readable ("permission denied").
>>
>> RW - Access in the shell is not a problem.
>>
>>
>>
>> Any site running Slurm and having encountered a similar issue?
>>
>>
>> Regards,
>> Thomas
>>
>>
>> Perhaps I should add that I have used the default nodemap for this, 
>> to avoid having to specify many hundreds of non-contiguous batch node 
>> IP ranges.
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0MB_ZRRXw$ 
>>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> https://urldefense.com/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!PvDODwlR4mBZyAb0!SF1EJmHJokm42L888JwiZfsoKpgqKkTF25wvx8PcIkUgF3OktC0ll3zzI-gYrNeFHg_bhBFf2L6C2aLMG0MB_ZRRXw$ 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20250109/c01a118f/attachment.htm>


More information about the lustre-discuss mailing list