[Lustre-discuss] 1.6.6 to 1.8.3 upgrade, OSS with wrong "Target" value

Roger Sersted rs1 at aps.anl.gov
Thu Jul 15 07:55:51 PDT 2010


OK.  This looks bad.  It appears that I should have upgraded ext3 to ext4, I 
found instructions for that,

	tune2fs -O extents,uninit_bg,dir_index /dev/XXX
	fsck -pf /dev/XXX
	
Is the above correct?  I'd like to move our systems to ext4. I didn't know those 
steps were necessary.

Other answers listed below.

Wojciech Turek wrote:
> Hi Roger,
> 
> Sorry for the delay. From the ldiskfs messages I seem to me that you are 
> using ext4 ldiskfs
> (Jun 26 17:54:30 puppy7 kernel: ldiskfs created from ext4-2.6-rhel5).
> If you upgrading from 1.6.6 you ldiskfs is ext3 based so I think taht in 
> lustre-1.8.3 you should use ext3 based ldiskfs rpm.
> 
> Can you also  tell us a bit more about your setup? From what you wrote 
> so far I understand you have 2 OSS servers and each server has one OST 
> device. In addition to that you have a third server which acts as a 
> MGS/MDS, is that right?
> 
> The logs you provided seem to be only from one server called puppy7 so 
> it does not give a whole picture of the situation. The timeout messages 
> may indicate a problem with communication between the servers but it is 
> really difficult to say without seeing the whole picture or at least 
> more elements of it.
> 
> To check if you have correct rpms installed can you please run 'rpm -qa 
> | grep lustre' on both OSS servers and the MDS?
> 
> Also please provide output from command 'lctl list_nids'  run on both 
> OSS servers, MDS and a client?

puppy5 (MDS/MGS)
172.17.2.5 at o2ib
172.16.2.5 at tcp

puppy6 (OSS)
172.17.2.6 at o2ib
172.16.2.6 at tcp

puppy7 (OSS)
172.17.2.7 at o2ib
172.16.2.7 at tcp


> 
> In addition to above please run following command on all lustre targets 
> (OSTs and MDT) to display your current lustre configuration
> 
>  tunefs.lustre --dryrun --print /dev/<ost_device>

puppy5 (MDS/MGS)
    Read previous values:
Target:     lustre1-MDT0000
Index:      0
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x405
               (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: lov.stripesize=125K lov.stripecount=2 
mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE mdt.group_upcall=NONE


    Permanent disk data:
Target:     lustre1-MDT0000
Index:      0
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x405
               (MDT MGS )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters: lov.stripesize=125K lov.stripecount=2 
mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE mdt.group_upcall=NONE

exiting before disk write.
----------------------------------------------------
puppy6
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

    Read previous values:
Target:     lustre1-OST0000
Index:      0
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x2
               (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.2.5 at o2ib


    Permanent disk data:
Target:     lustre1-OST0000
Index:      0
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x2
               (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.2.5 at o2ib
--------------------------------------------------
puppy7 (this is the broken OSS. The "Target" should be "lustre1-OST0001")
checking for existing Lustre data: found CONFIGS/mountdata
Reading CONFIGS/mountdata

    Read previous values:
Target:     lustre1-OST0000
Index:      0
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x2
               (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.2.5 at o2ib


    Permanent disk data:
Target:     lustre1-OST0000
Index:      0
Lustre FS:  lustre1
Mount type: ldiskfs
Flags:      0x2
               (OST )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=172.17.2.5 at o2ib

exiting before disk write.


> 
> If possible please attach syslog from each machine from the time you 
> mounted lustre targets (OST and MDT).
> 
> Best regards,
> 
> Wojciech
> 
> On 14 July 2010 20:46, Roger Sersted <rs1 at aps.anl.gov 
> <mailto:rs1 at aps.anl.gov>> wrote:
> 
> 
>     Any additional info?
> 
>     Thanks,
> 
>     Roger S.
> 
> 
> 
> 
> -- 
> --
> Wojciech Turek
> 
> 



More information about the lustre-discuss mailing list