[Lustre-discuss] ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway

Christopher J.Walker C.J.Walker at qmul.ac.uk
Mon Jan 4 09:16:51 PST 2010


Heiko Schröter wrote:
> Am Mittwoch 23 Dezember 2009 12:22:17 schrieb Christopher J. Walker:
>>> Dec 22 17:18:49 proof kernelLustreError: 10917:0:
>>> (ldlm_request.c:1030:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: 
>>> canceling anyway
>>> Dec 22 17:18:49 proof kernelLustreError: 10917:0:
>>> (ldlm_request.c:1030:ldlm_cli_cancel_req()) Skipped 169 previous similar 
>>> messages
>>> Dec 22 17:18:49 proof kernelLustreError: 10917:0:
>>> (ldlm_request.c:1533:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
>>> Dec 22 17:18:49 proof kernelLustreError: 10917:0:
>>> (ldlm_request.c:1533:ldlm_cli_cancel_list()) Skipped 169 previous similar 
>>> messages
>>> Dec 22 17:18:49 proof kernelLustre: client ffff81042fccf400 umount complete
>>> Dec 22 17:19:02 proof kernelLustre: Client userdata-client has started
>>>
>>> Is anybody else seeing these messages in this situation? Does anyboyd know for 
>>> a workaround??
>> Like Ewan, our Lustre filesystem is automounted. Whilst I haven't done a 
>> detailed study, it does look as though these messages occur immediately 
>> before unmounting the filesystem.
> 
> Yes. These messages do occur before 'auto'-un-mounting. So nothing to worry about.
> The above is the mount process.
> 
> Unmounting should look like this:
> Jun 17 04:00:16 cluster1 LustreError: 6460:0:(ldlm_request.c:1043:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
> Jun 17 04:00:16 cluster1 LustreError: 6460:0:(ldlm_request.c:1632:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
> Jun 17 04:00:16 cluster1 Lustre: client ffff8100c44d1000 umount complete
> 
> If you don't see the last line 'umount complete' automount + lustre will hang and there should be no further access to the lustre system.
> Happend to us in our scenario.

Something similar has been happening to us with lustre 1.8 - which is 
partly what prompted the question. When I look at the machine, the 
lustre_0 filesystem doesn't seem to be there - and looking doesn't 
prompt any lustre errors. The lustre_1 filesystem automounts fine. I 
think that forcing the filesystem to stay mounted helps - but I need to 
do some more investigating.


> 
>> Is automounting a bad idea?
> 
> It depends. We had some bad experiences with lustre-1.6.6 and automount. See the mail archive about it. Subject: 'Stalled autofs + lustre'
> Our problem should be resolved with upgrading to 1.8.x.
> We will test again in Jan/Feb 2010 when the upgrade is sheduled.
> 

Do let me know.

Thanks,

Chris



More information about the lustre-discuss mailing list