[Lustre-discuss] Mount error with message: "Err -22 on cfg command:"

Reto Gantenbein reto.gantenbein at id.unibe.ch
Mon Sep 1 20:49:41 PDT 2008


Hello Andreas

Thanks a lot for your advice. Especially the Writeconf hint was very  
valuable. I knew that I already read about it before, but in the heat  
of the moment I couldn't find it anymore. Its chapter has such a  
meaningful name "Other Configuration Tasks".

Finally I could rescue the file system with unmounting all clients and  
all servers, then running tunefs.lustre --writeconf on all lustre  
devices and restarting all clients (!), before mounting the MGS and  
OST again. I'm aware that I also lost the client logs now, but before  
doing so, I could mount the MGS and OST on the servers, but not the  
clients. I always got some strange connection errors, imho because the  
clients were still trying to write some changes back to the servers  
even I did unmount them before with umount -f. This somehow prevented  
the clients from accessing the filesystem.

Now I'm doing a fsck and I hope to be online again very soon.

Kind regards,
Reto Gantenbein



On Sep 1, 2008, at 7:44 AM, Andreas Dilger wrote:

> On Aug 29, 2008  15:12 +0200, Reto Gantenbein wrote:
>> Some days ago we had a problem that four OSTs were disconnecting
>> themselves. To recover, I deactivated them with 'lctl conf_param
>> homefs-OST0002.osc.active=0'
>
> Note that using "lctl conf_param" is intended to permanently set a
> configuration parameter, not for temprarily disabling an OSC.  To
> disable the OSC temporarily you should have just done:
>
> 	lctl --device={device} deactivate
> and
> 	lctl --device={device} recover
>
> Now you have a parameter in the configuration log which disables
> this OSC as soon as any client mounts...
>
>> remounted them and waited until they
>> were recovered and activated them again. Some hosts which kept the
>> Lustre file system mounted at this time, resumed to work correctly on
>> the paused devices.
>>
>> But when I want to mount Lustre with on a new client:
>>
>> node01 ~ # mount -t lustre lustre01 at tcp:lustre02 at tcp:/homefs /home
>>
>>  it refuses with the following message:
>>
>> LustreError: 3794:0:(obd_config.c:897:class_process_proc_param())
>> homefs-OST0002-osc-ffff81022f630000: unknown param activate=0
>
> It seems you had a typo in your conf_param also...  Handling  
> (ignoring)
> of invalid config params is fixed with bug 14693 (fixed in 1.6.5).  It
> doesn't fix the problem of the _valid_ command that deactivates this
> OSC.
>
> I would suggest rewriting your configuration file with --writeconf,
> see "4.2.3.2 Running the Writeconf Command"...
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>




More information about the lustre-discuss mailing list