[Lustre-discuss] Fwd: lustre OSS IP change

Wojciech Turek wjt27 at cam.ac.uk
Thu Jan 13 10:02:10 PST 2011


Hi Brendon,

So it looks like you Lustre was just stuck in recovery processes after all.
It is a bit concerning that you had kernel panics on MDS during recovery.
Which Lustre version are you using? Do you have stack traces from the kernel
panics?

Wojciech


On 13 January 2011 17:41, Brendon <b at brendon.com> wrote:

> On Tue, Jan 11, 2011 at 3:35 PM, Wojciech Turek <wjt27 at cam.ac.uk> wrote:
> > Hi Brendon,
> >
> > Can you please provide following:
> > 1) output of ifconfig run on each OSS MDS and at least one client
> > 2) output of lctl list_nids run on each OSS MDS and at least one client
> > 3) output of tunefs.lustre --print --dryrun /dev/<OST_block_device> from
> > each OSS
> >
> > Wojciech
>
> After someone looked at the emails I sent out, they grabbed me on IRC.
> We had a discussion and basically they interpreted the email as
> everything should be working, I just needed to wait for a repair to
> run and complete. What I then learned is that first, a client has to
> connect for a repair to initiate. Secondly, the code isn't perfect.
> The MDS kernel oops'ed twice before it finally completed a repair
> successfully. I was in the process of disabling panic on oops, but it
> finally completed successfully. Once that was done, I got a clean bill
> of health.
>
> Just to complete this discussion, I have listed the requested output.
> I might still learn something :)
>
> ...Looks like I did learn something. OSS0 has an issue with the root
> FS and was remounted RO which I discovered when running  tunefs.lustre
> --print --dryrun /dev/sda5.
>
> The fun never ends :)
> -Brendon
>
> 1) ifconfig info
> MDS: # ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:15:17:5E:46:64
>          inet addr:10.1.1.1  Bcast:10.1.1.255  Mask:255.255.255.0
>          inet6 addr: fe80::215:17ff:fe5e:4664/64 Scope:Link
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:49140546 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:63644404 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:18963170801 (17.6 GiB)  TX bytes:65261762295 (60.7 GiB)
>          Base address:0xcc00 Memory:f58e0000-f5900000
>
> eth1      Link encap:Ethernet  HWaddr 00:15:17:5E:46:65
>          inet addr:192.168.0.181  Bcast:192.168.0.255  Mask:255.255.255.0
>          inet6 addr: fe80::215:17ff:fe5e:4665/64 Scope:Link
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:236738842 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:458503163 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:100
>          RX bytes:15562858193 (14.4 GiB)  TX bytes:686167422947 (639.0 GiB)
>          Base address:0xc880 Memory:f5880000-f58a0000
>
> OSS : # ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:1D:60:E0:5B:B2
>          inet addr:10.1.1.2  Bcast:10.1.1.255  Mask:255.255.255.0
>          inet6 addr: fe80::21d:60ff:fee0:5bb2/64 Scope:Link
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:3092588 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:3547204 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:1320521551 (1.2 GiB)  TX bytes:2670089148 (2.4 GiB)
>          Interrupt:233
>
> client: # ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:1E:8C:39:E4:69
>          inet addr:10.1.1.5  Bcast:10.1.1.255  Mask:255.255.255.0
>          inet6 addr: fe80::21e:8cff:fe39:e469/64 Scope:Link
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:727922 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:884188 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:433349006 (413.2 MiB)  TX bytes:231985578 (221.2 MiB)
>          Interrupt:50
>
>
>
> 2) lctl list_nids
>
> client: lctl list_nids
> 10.1.1.5 at tcp
>
> MDS: lctl list_nids
> 10.1.1.1 at tcp
>
> OSS: lctl list_nids
> 10.1.1.2 at tcp
>
> 3) tunefs.lustre --print --dryrun /dev/sda5
> OSS0: ]# tunefs.lustre --print --dryrun /dev/sda5
> checking for existing Lustre data: found CONFIGS/mountdata
> tunefs.lustre: Can't create temporary directory /tmp/dirCZXt3k:
> Read-only file system
>
> tunefs.lustre FATAL: Failed to read previous Lustre data from /dev/sda5
> (30)
> tunefs.lustre: exiting with 30 (Read-only file system)
>
> OSS1: # tunefs.lustre --print --dryrun /dev/sda5
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
>
>   Read previous values:
> Target:     mylustre-OST0001
> Index:      1
> Lustre FS:  mylustre
> Mount type: ldiskfs
> Flags:      0x2
>              (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.1.1.1 at tcp
>
>
>   Permanent disk data:
> Target:     mylustre-OST0001
> Index:      1
> Lustre FS:  mylustre
> Mount type: ldiskfs
> Flags:      0x2
>              (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.1.1.1 at tcp
>
> exiting before disk write.
>
>
> OSS2: # tunefs.lustre --print --dryrun /dev/sda5
> checking for existing Lustre data: found CONFIGS/mountdata
> Reading CONFIGS/mountdata
>
>   Read previous values:
> Target:     mylustre-OST0002
> Index:      2
> Lustre FS:  mylustre
> Mount type: ldiskfs
> Flags:      0x2
>              (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.1.1.1 at tcp
>
>
>   Permanent disk data:
> Target:     mylustre-OST0002
> Index:      2
> Lustre FS:  mylustre
> Mount type: ldiskfs
> Flags:      0x2
>              (OST )
> Persistent mount opts: errors=remount-ro,extents,mballoc
> Parameters: mgsnode=10.1.1.1 at tcp
>
> exiting before disk write.
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110113/ee915698/attachment.htm>


More information about the lustre-discuss mailing list