[Lustre-discuss] 1.6.6 to 1.8.3 upgrade, OSS with wrong "Target" value

Thu Jul 15 08:01:44 PDT 2010

can you also please post output of  'rpm -qa | grep lustre' run on puppy5-7
?

On 15 July 2010 15:55, Roger Sersted <rs1 at aps.anl.gov> wrote:

>
> OK.  This looks bad.  It appears that I should have upgraded ext3 to ext4,
> I found instructions for that,
>
>        tune2fs -O extents,uninit_bg,dir_index /dev/XXX
>        fsck -pf /dev/XXX
>
> Is the above correct?  I'd like to move our systems to ext4. I didn't know
> those steps were necessary.
>
> Other answers listed below.
>
>
> Wojciech Turek wrote:
>
>> Hi Roger,
>>
>> Sorry for the delay. From the ldiskfs messages I seem to me that you are
>> using ext4 ldiskfs
>> (Jun 26 17:54:30 puppy7 kernel: ldiskfs created from ext4-2.6-rhel5).
>> If you upgrading from 1.6.6 you ldiskfs is ext3 based so I think taht in
>> lustre-1.8.3 you should use ext3 based ldiskfs rpm.
>>
>> Can you also  tell us a bit more about your setup? From what you wrote so
>> far I understand you have 2 OSS servers and each server has one OST device.
>> In addition to that you have a third server which acts as a MGS/MDS, is that
>> right?
>>
>> The logs you provided seem to be only from one server called puppy7 so it
>> does not give a whole picture of the situation. The timeout messages may
>> indicate a problem with communication between the servers but it is really
>> difficult to say without seeing the whole picture or at least more elements
>> of it.
>>
>> To check if you have correct rpms installed can you please run 'rpm -qa |
>> grep lustre' on both OSS servers and the MDS?
>>
>> Also please provide output from command 'lctl list_nids'  run on both OSS
>> servers, MDS and a client?
>>
>
> puppy5 (MDS/MGS)
>
> 172.17.2.5 at o2ib
> 172.16.2.5 at tcp
>
> puppy6 (OSS)
> 172.17.2.6 at o2ib
> 172.16.2.6 at tcp
>
> puppy7 (OSS)
> 172.17.2.7 at o2ib
> 172.16.2.7 at tcp
>
>
>
>
>> In addition to above please run following command on all lustre targets
>> (OSTs and MDT) to display your current lustre configuration
>>
>>  tunefs.lustre --dryrun --print /dev/<ost_device>
>>
>
> puppy5 (MDS/MGS)
>   Read previous values:
> Target:     lustre1-MDT0000
> Index:      0
> Lustre FS:  lustre1
> Mount type: ldiskfs
> Flags:      0x405
>              (MDT MGS )
> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
> Parameters: lov.stripesize=125K lov.stripecount=2
> mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE
> mdt.group_upcall=NONE
>
>
>   Permanent disk data:
> Target:     lustre1-MDT0000
> Index:      0
> Lustre FS:  lustre1
> Mount type: ldiskfs
> Flags:      0x405
>              (MDT MGS )
> Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
> Parameters: lov.stripesize=125K lov.stripecount=2
> mdt.group_upcall=/usr/sbin/l_getgroups mdt.group_upcall=NONE
> mdt.group_upcall=NONE
>
> exiting before disk write.
> ----------------------------------------------------
> puppy6
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
>
>   Read previous values:
> Target:     lustre1-OST0000
> Index:      0
> Lustre FS:  lustre1
> Mount type: ldiskfs
> Flags:      0x2
>              (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=172.17.2.5 at o2ib
>
>
>   Permanent disk data:
> Target:     lustre1-OST0000
> Index:      0
> Lustre FS:  lustre1
> Mount type: ldiskfs
> Flags:      0x2
>              (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=172.17.2.5 at o2ib
> --------------------------------------------------
> puppy7 (this is the broken OSS. The "Target" should be "lustre1-OST0001")
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
>
>   Read previous values:
> Target:     lustre1-OST0000
> Index:      0
> Lustre FS:  lustre1
> Mount type: ldiskfs
> Flags:      0x2
>              (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=172.17.2.5 at o2ib
>
>
>   Permanent disk data:
> Target:     lustre1-OST0000
> Index:      0
> Lustre FS:  lustre1
> Mount type: ldiskfs
> Flags:      0x2
>              (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=172.17.2.5 at o2ib
>
> exiting before disk write.
>
>
>
>> If possible please attach syslog from each machine from the time you
>> mounted lustre targets (OST and MDT).
>>
>> Best regards,
>>
>> Wojciech
>>
>> On 14 July 2010 20:46, Roger Sersted <rs1 at aps.anl.gov <mailto:
>> rs1 at aps.anl.gov>> wrote:
>>
>>
>>    Any additional info?
>>
>>    Thanks,
>>
>>    Roger S.
>>
>>
>>
>>
>> --
>> --
>> Wojciech Turek
>>
>>
>>

-- 
--
Wojciech Turek

Assistant System Manager

High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100715/33503541/attachment.htm>