[Lustre-discuss] 1.6.6 to 1.8.3 upgrade, OSS with wrong "Target" value

Wojciech Turek wjt27 at cam.ac.uk
Thu Jul 15 10:56:00 PDT 2010


Hi Roger,

the Lustre 1.8.3 for RHEL5 has to set of RPMS one set for old style ext3
based ldiskfs and one set for the ext4 based ldiskfs. When upgrading from
1.6.6 to 1.8.3 I think you should not try to use the ext4 based packages,
can you let us know which RPMs have you used?



On 15 July 2010 16:14, Roger Sersted <rs1 at aps.anl.gov> wrote:

>
>
> Wojciech Turek wrote:
>
>> can you also please post output of  'rpm -qa | grep lustre' run on
>> puppy5-7 ?
>>
>
>
> [root at puppy5 log]# rpm -qa |grep -i lustre
> kernel-2.6.18-164.11.1.el5_lustre.1.8.3
> lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3
> lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3
> mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3
> lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3
>
> [root at puppy6 log]# rpm -qa | grep -i lustre
> kernel-2.6.18-164.11.1.el5_lustre.1.8.3
> lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3
> lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3
> mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3
> lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3
>
> [root at puppy7 CONFIGS]# rpm -qa | grep -i lustre
> kernel-2.6.18-164.11.1.el5_lustre.1.8.3
> lustre-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3
> lustre-ldiskfs-3.0.9-2.6.18_164.11.1.el5_lustre.1.8.3
> mft-2.6.0-2.6.18_164.11.1.el5_lustre.1.8.3
> lustre-modules-1.8.3-2.6.18_164.11.1.el5_lustre.1.8.3
>
> Thanks,
>
> Roger S.
>
>
>> On 15 July 2010 15:55, Roger Sersted <rs1 at aps.anl.gov <mailto:
>> rs1 at aps.anl.gov>> wrote:
>>
>>
>>    OK.  This looks bad.  It appears that I should have upgraded ext3 to
>>    ext4, I found instructions for that,
>>
>>           tune2fs -O extents,uninit_bg,dir_index /dev/XXX
>>           fsck -pf /dev/XXX
>>               Is the above correct?  I'd like to move our systems to ext4.
>> I
>>    didn't know those steps were necessary.
>>
>>    Other answers listed below.
>>
>>
>>    Wojciech Turek wrote:
>>
>>        Hi Roger,
>>
>>        Sorry for the delay. From the ldiskfs messages I seem to me that
>>        you are using ext4 ldiskfs
>>        (Jun 26 17:54:30 puppy7 kernel: ldiskfs created from
>>        ext4-2.6-rhel5).
>>        If you upgrading from 1.6.6 you ldiskfs is ext3 based so I think
>>        taht in lustre-1.8.3 you should use ext3 based ldiskfs rpm.
>>
>>        Can you also  tell us a bit more about your setup? From what you
>>        wrote so far I understand you have 2 OSS servers and each server
>>        has one OST device. In addition to that you have a third server
>>        which acts as a MGS/MDS, is that right?
>>
>>        The logs you provided seem to be only from one server called
>>        puppy7 so it does not give a whole picture of the situation. The
>>        timeout messages may indicate a problem with communication
>>        between the servers but it is really difficult to say without
>>        seeing the whole picture or at least more elements of it.
>>
>>        To check if you have correct rpms installed can you please run
>>        'rpm -qa | grep lustre' on both OSS servers and the MDS?
>>
>>        Also please provide output from command 'lctl list_nids'  run on
>>        both OSS servers, MDS and a client?
>>
>>
>>    puppy5 (MDS/MGS)
>>
>>    172.17.2.5 at o2ib
>>    172.16.2.5 at tcp
>>
>>    puppy6 (OSS)
>>    172.17.2.6 at o2ib
>>    172.16.2.6 at tcp
>>
>>    puppy7 (OSS)
>>    172.17.2.7 at o2ib
>>    172.16.2.7 at tcp
>>
>>
>>
>>
>>        In addition to above please run following command on all lustre
>>        targets (OSTs and MDT) to display your current lustre configuration
>>
>>         tunefs.lustre --dryrun --print /dev/<ost_device>
>>
>>
>>    puppy5 (MDS/MGS)
>>      Read previous values:
>>    Target:     lustre1-MDT0000
>>    Index:      0
>>    Lustre FS:  lustre1
>>    Mount type: ldiskfs
>>    Flags:      0x405
>>                 (MDT MGS )
>>    Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
>>    Parameters: lov.stripesize=125K lov.stripecount=2
>>    mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE
>>    mdt.group_upcall=NONE
>>
>>
>>      Permanent disk data:
>>    Target:     lustre1-MDT0000
>>    Index:      0
>>    Lustre FS:  lustre1
>>    Mount type: ldiskfs
>>    Flags:      0x405
>>                 (MDT MGS )
>>    Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
>>    Parameters: lov.stripesize=125K lov.stripecount=2
>>    mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE
>>    mdt.group_upcall=NONE
>>
>>    exiting before disk write.
>>    ----------------------------------------------------
>>    puppy6
>>    checking for existing Lustre data: found CONFIGS/mountdata
>>    Reading CONFIGS/mountdata
>>
>>      Read previous values:
>>    Target:     lustre1-OST0000
>>    Index:      0
>>    Lustre FS:  lustre1
>>    Mount type: ldiskfs
>>    Flags:      0x2
>>                 (OST )
>>    Persistent mount opts: errors=remount-ro,extents,mballoc
>>    Parameters: mgsnode=172.17.2.5 at o2ib
>>
>>
>>      Permanent disk data:
>>    Target:     lustre1-OST0000
>>    Index:      0
>>    Lustre FS:  lustre1
>>    Mount type: ldiskfs
>>    Flags:      0x2
>>                 (OST )
>>    Persistent mount opts: errors=remount-ro,extents,mballoc
>>    Parameters: mgsnode=172.17.2.5 at o2ib
>>    --------------------------------------------------
>>    puppy7 (this is the broken OSS. The "Target" should be
>>    "lustre1-OST0001")
>>    checking for existing Lustre data: found CONFIGS/mountdata
>>    Reading CONFIGS/mountdata
>>
>>      Read previous values:
>>    Target:     lustre1-OST0000
>>    Index:      0
>>    Lustre FS:  lustre1
>>    Mount type: ldiskfs
>>    Flags:      0x2
>>                 (OST )
>>    Persistent mount opts: errors=remount-ro,extents,mballoc
>>    Parameters: mgsnode=172.17.2.5 at o2ib
>>
>>
>>      Permanent disk data:
>>    Target:     lustre1-OST0000
>>    Index:      0
>>    Lustre FS:  lustre1
>>    Mount type: ldiskfs
>>    Flags:      0x2
>>                 (OST )
>>    Persistent mount opts: errors=remount-ro,extents,mballoc
>>    Parameters: mgsnode=172.17.2.5 at o2ib
>>
>>    exiting before disk write.
>>
>>
>>
>>        If possible please attach syslog from each machine from the time
>>        you mounted lustre targets (OST and MDT).
>>
>>        Best regards,
>>
>>        Wojciech
>>
>>        On 14 July 2010 20:46, Roger Sersted <rs1 at aps.anl.gov
>>        <mailto:rs1 at aps.anl.gov> <mailto:rs1 at aps.anl.gov
>>
>>        <mailto:rs1 at aps.anl.gov>>> wrote:
>>
>>
>>           Any additional info?
>>
>>           Thanks,
>>
>>           Roger S.
>>
>>
>>
>>
>>        --         --
>>        Wojciech Turek
>>
>>
>>
>>
>>
>> --
>> --
>> Wojciech Turek
>>
>> Assistant System Manager
>> 517
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100715/1231e2ee/attachment.htm>


More information about the lustre-discuss mailing list