[Lustre-discuss] Problem with write_conf

Nathan Rutman nathan.rutman at oracle.com
Tue Aug 3 13:05:19 PDT 2010


On Aug 3, 2010, at 12:49 PM, Roger Spellman wrote:

> If I change the NIDs, and if I don’t remove /mnt/mdt/CONFIGS/*-client, then I get the following when I try mounting a client (note that 10.2.9.1 is the OLD address):
>  
> mount.lustre: mount 10.2.9.1 at o2ib:/hss2 at /mnt/lustre-hss2 failed: Cannot send after transport endpoint shutdown

Don't mount with the old address :)
This is not contained in the config log; this is the MGS address the client needs to talk to to GET the config log.  It needs to point to the current IP of the MGS.  Maybe you've stuck this in /etc/fstab or perhaps your DNS name resolution of the MGS's common name hasn't been updated. 

>  
> dmesg shows:
>  
> Lustre: Request x1 sent from MGC10.2.9.1 at o2ib to NID 10.2.9.1 at o2ib 5s ago has timed out (limit 5s).
> LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log 'hss2-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
> LustreError: 6285:0:(llite_lib.c:1065:ll_fill_super()) Unable to process log: -108
> Lustre: client ffff81007e98e800 umount complete
> LustreError: 6285:0:(obd_mount.c:1991:lustre_fill_super()) Unable to mount  (-108)
>  
> Am I missing a step?
>  
> -Roger
>  
> From: Nathan Rutman [mailto:nathan.rutman at oracle.com] 
> Sent: Tuesday, August 03, 2010 2:34 PM
> To: Roger Spellman
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Problem with write_conf
>  
>  
> On Aug 3, 2010, at 11:25 AM, Roger Spellman wrote:
> 
> 
> Nathan,
>  
> Thank you.   That works!
>  
> I found that if I change IP address, I also need to remove the file  /mnt/mdt/CONFIGS/*-client.
>  
> This is what tunefs.lustre --writeconf on the MDT does, when you first mount it after the writeconf.
> --writeconf on the MDT and all OSTs is the preferred way of changing a server nid.
>  
> 
>  
> The reason is that the OST mounts failed – the OST was still looking for the old IP Address.  I grepped for files with the old IP Address, and I found those client files.
> 
> Is that a safe thing to do?  Please note that my mdt and mgs are on the same LUN.
>  
> Thanks.
>  
> -Roger
>  
>  
> From: Nathan Rutman [mailto:nathan.rutman at oracle.com] 
> Sent: Tuesday, August 03, 2010 2:03 PM
> To: Roger Spellman
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Problem with write_conf
>  
> There's a 'failsafe' feature that  prevents filesystem name changes:
>> LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged?
>> 
> You'll have to go and delete the last_rcvd file off the disk for all the servers in the filesystem as well as tunefs --writeconf them all to the name AFTER name.  
>  
> On Aug 2, 2010, at 6:08 PM, Roger Spellman wrote:
> 
> 
> 
>  
> Hi,
> I would like to be able to change a file system name.  Towards that end, I have run the following commands as an experiment:
> 
>   mkfs.lustre --reformat --fsname BEFORE  --device-size=10000 --mgs --mdt  --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0
>   dmesg -c
>   mount -t lustre /dev/mapper/map0 /mnt/mdt
>   dmesg -c
>   umount /mnt/mdt
>   dmesg -c
>   tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0
>   dmesg -c
>   mount -t lustre /dev/mapper/map0 /mnt/mdt
>   dmesg -c
> 
> Unfortunately, this does not work.  Can someone please explain the correct sequence of commands to ues?  The output of each command is as follows.
> 
> Thanks.
> 
> [root at ts-hss2-01 ~]# mkfs.lustre --reformat --fsname BEFORE  --device-size=10000 --mgs --mdt  --mgsnode=10.2.9.1 at o2ib0 /dev/mapper/map0
> 
>    Permanent disk data:
> Target:     BEFORE-MDTffff
> Index:      unassigned
> Lustre FS:  BEFORE
> Mount type: ldiskfs
> Flags:      0x75
>               (MDT MGS needs_index first_time update )
> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
> Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups
> 
> device size = 1632256MB
> 2 6 18
> formatting backing filesystem ldiskfs on /dev/mapper/map0
>         target name  BEFORE-MDTffff
>         4k blocks     2500
>         options        -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F
> mkfs_cmd = mke2fs -j -b 4096 -L BEFORE-MDTffff  -i 4096 -I 512 -q -O dir_index,extents,uninit_groups -F /dev/mapper/map0 2500
> Writing CONFIGS/mountdata
> [root at ts-hss2-01 ~]# dmesg -c
> LDISKFS-fs: barriers enabled
> kjournald2 starting: pid 1388, dev dm-4:8, commit interval 5 seconds
> LDISKFS FS on dm-4, internal journal on dm-4:8
> LDISKFS-fs: delayed allocation enabled
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> LDISKFS-fs: mounted filesystem dm-4 with ordered data mode
> LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success)
> LDISKFS-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost
> LDISKFS-fs: mballoc: 1 generated and it took 2142
> LDISKFS-fs: mballoc: 512 preallocated, 0 discarded
> 
> 
> [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt
> [root at ts-hss2-01 ~]# dmesg -c
> LDISKFS-fs: barriers enabled
> kjournald2 starting: pid 1406, dev dm-4:8, commit interval 5 seconds
> LDISKFS FS on dm-4, internal journal on dm-4:8
> LDISKFS-fs: delayed allocation enabled
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> LDISKFS-fs: mounted filesystem dm-4 with ordered data mode
> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
> LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost
> LDISKFS-fs: mballoc: 0 generated and it took 0
> LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
> LDISKFS-fs: barriers enabled
> kjournald2 starting: pid 1410, dev dm-4:8, commit interval 5 seconds
> LDISKFS FS on dm-4, internal journal on dm-4:8
> LDISKFS-fs: delayed allocation enabled
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> LDISKFS-fs: mounted filesystem dm-4 with ordered data mode
> Lustre: MGS MGS started
> Lustre: MGC10.2.9.1 at o2ib: Reactivating import
> Lustre: Setting parameter BEFORE-MDT0000.mdt.group_upcall in log BEFORE-MDT0000
> Lustre: Enabling user_xattr
> Lustre: BEFORE-MDT0000: new disk, initializing
> Lustre: BEFORE-MDT0000: Now serving BEFORE-MDT0000 on /dev/mapper/map0 with recovery enabled
> Lustre: 1503:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) BEFORE-MDT0000: group upcall set to /usr/sbin/l_getgroups
> Lustre: BEFORE-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups
> 
> 
> [root at ts-hss2-01 ~]# umount /mnt/mdt
> [root at ts-hss2-01 ~]# dmesg -c
> Lustre: Failing over BEFORE-MDT0000
> Lustre: Skipped 1 previous similar message
> Lustre: *** setting obd BEFORE-MDT0000 device 'dm-4' read-only ***
> Turning device dm-4 (0xfd00004) read-only
> Lustre: BEFORE-MDT0000: shutting down for failover; client state will be preserved.
> Lustre: MDT BEFORE-MDT0000 has stopped.
> LustreError: 1517:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
> LustreError: 1517:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
> Lustre: MGS has stopped.
> LDISKFS-fs: mballoc: 3 blocks 3 reqs (0 success)
> LDISKFS-fs: mballoc: 8 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost
> LDISKFS-fs: mballoc: 1 generated and it took 2598
> LDISKFS-fs: mballoc: 1145 preallocated, 0 discarded
> Removing read-only on unknown block (0xfd00004)
> Lustre: server umount BEFORE-MDT0000 complete
> 
> 
> [root at ts-hss2-01 ~]# tunefs.lustre --writeconf --fsname=AFTER --mgs --mdt /dev/mapper/map0
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
> 
>    Read previous values:
> Target:     BEFORE-MDT0000
> Index:      0
> Lustre FS:  BEFORE
> Mount type: ldiskfs
> Flags:      0x5
>               (MDT MGS )
> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
> Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups
> 
> 
>    Permanent disk data:
> Target:     AFTER-MDT0000
> Index:      0
> Lustre FS:  AFTER
> Mount type: ldiskfs
> Flags:      0x105
>               (MDT MGS writeconf )
> Persistent mount opts: iopen_nopriv,user_xattr,errors=remount-ro
> Parameters: mgsnode=10.2.9.1 at o2ib mdt.group_upcall=/usr/sbin/l_getgroups
> 
> Writing CONFIGS/mountdata
> [root at ts-hss2-01 ~]# dmesg -c
> LDISKFS-fs: barriers enabled
> kjournald2 starting: pid 1539, dev dm-4:8, commit interval 5 seconds
> LDISKFS FS on dm-4, internal journal on dm-4:8
> LDISKFS-fs: delayed allocation enabled
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> LDISKFS-fs: recovery complete.
> LDISKFS-fs: mounted filesystem dm-4 with ordered data mode
> LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success)
> LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost
> LDISKFS-fs: mballoc: 1 generated and it took 2905
> LDISKFS-fs: mballoc: 506 preallocated, 0 discarded
> 
> 
> [root at ts-hss2-01 ~]# mount -t lustre /dev/mapper/map0 /mnt/mdt
> mount.lustre: mount /dev/mapper/map0 at /mnt/mdt failed: Invalid argument
> This may have multiple causes.
> Are the mount options correct?
> Check the syslog for more info.
> [root at ts-hss2-01 ~]# dmesg -c
> LDISKFS-fs: barriers enabled
> kjournald2 starting: pid 1567, dev dm-4:8, commit interval 5 seconds
> LDISKFS FS on dm-4, internal journal on dm-4:8
> LDISKFS-fs: delayed allocation enabled
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> LDISKFS-fs: mounted filesystem dm-4 with ordered data mode
> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
> LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost
> LDISKFS-fs: mballoc: 0 generated and it took 0
> LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
> LDISKFS-fs: barriers enabled
> kjournald2 starting: pid 1575, dev dm-4:8, commit interval 5 seconds
> LDISKFS FS on dm-4, internal journal on dm-4:8
> LDISKFS-fs: delayed allocation enabled
> LDISKFS-fs: file extents enabled
> LDISKFS-fs: mballoc enabled
> LDISKFS-fs: mounted filesystem dm-4 with ordered data mode
> Lustre: MGS MGS started
> Lustre: MGC10.2.9.1 at o2ib: Reactivating import
> Lustre: MGS: Logs for fs AFTER were removed by user request.  All servers must be restarted in order to regenerate the logs.
> Lustre: Setting parameter AFTER-MDT0000.mdt.group_upcall in log AFTER-MDT0000
> Lustre: Enabling user_xattr
> LustreError: 157-3: Trying to start OBD AFTER-MDT0000_UUID using the wrong disk BEFORE-MDT0000_UUID. Were the /dev/ assignments rearranged?
> LustreError: 1665:0:(mds_fs.c:828:mds_fs_setup()) cannot read last_rcvd: rc = -22
> LustreError: 1665:0:(handler.c:2007:mds_setup()) AFTER-MDT0000: MDS filesystem method init failed: rc = -22
> LustreError: 1665:0:(obd_config.c:372:class_setup()) setup AFTER-MDT0000 failed (-22)
> LustreError: 1665:0:(obd_config.c:1199:class_config_llog_handler()) Err -22 on cfg command:
> Lustre:    cmd=cf003 0:AFTER-MDT0000  1:AFTER-MDT0000_UUID  2:0  3:AFTER-MDT0000 
> LustreError: 15b-f: MGC10.2.9.1 at o2ib: The configuration from log 'AFTER-MDT0000' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre.
> LustreError: 15c-8: MGC10.2.9.1 at o2ib: The configuration from log 'AFTER-MDT0000' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
> LustreError: 1566:0:(obd_mount.c:1124:server_start_targets()) failed to start server AFTER-MDT0000: -22
> LustreError: 1566:0:(obd_mount.c:1653:server_fill_super()) Unable to start targets: -22
> LustreError: 1566:0:(obd_config.c:443:class_cleanup()) Device 4 not setup
> LustreError: 1566:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
> LustreError: 1566:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
> Lustre: MGS has stopped.
> LDISKFS-fs: mballoc: 1 blocks 1 reqs (0 success)
> LDISKFS-fs: mballoc: 6 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost
> LDISKFS-fs: mballoc: 1 generated and it took 2883
> LDISKFS-fs: mballoc: 503 preallocated, 0 discarded
> Lustre: 1566:0:(obd_mount.c:1473:server_put_super()) Cleaning orphaned obd AFTER-mdtlov
> Lustre: server umount AFTER-MDT0000 complete
> LustreError: 1566:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount  (-22)
> 
> Roger Spellman
> Staff Engineer
> Terascala, Inc.
> 508-588-1501
> www.terascala.com <http://www.terascala.com/>
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>  
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100803/0ad9c3a4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1931 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100803/0ad9c3a4/attachment.bin>


More information about the lustre-discuss mailing list