[Lustre-discuss] Kernel Panic error while running lustre 2.0with infiniband

CHU, STEPHEN H (ATTSI) sc1680 at att.com
Mon Feb 21 11:23:56 PST 2011


Try adding the following to modprobe.conf ahead of "options lnet..." for
rebooting:

alias ib0 ib_ipoib
options lnet network="02ib0(ib0)"

On the running node do : modprobe ib_ipoib

I've ran Lustre 2.0 with infiniband without any problems. For RHEL5.4
and beyond, I had to modify the stocked openibd to unload lustre prior
to stopping the network or else it will hang.

Steve

> -----Original Message-----
> From: Albert Everett [mailto:aeeverett at ualr.edu]
> Sent: Monday, February 21, 2011 11:01 AM
> To: Arya Mazaheri
> Cc: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Kernel Panic error while running lustre
2.0with
> infiniband
> 
> What's output of
> 
> # ifconfig ib0
> 
> Albert
> 
> On Feb 21, 2011, at 6:27 AM, Arya Mazaheri wrote:
> 
> > Hi there,
> > I have configured and ran lustre 2.0 with tcp (OSS and MDS on on the
> > same server) without problem. Now I am trying to run lustre with
> > infiniband support. but whenever I mount the mdt storage on server,
> > the process ends with following error:
> > kernel panic - not syncing: fatal exception
> >
> > my /etc/modprobe.conf is:
> > options lnet networks="o2ib0(ib0)"
> >
> > last lines of dmesg:
> > ----------------------------------------------------
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sda2, internal journal
> > LDISKFS-fs: recovery complete.
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sda2, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Can't query
> > IPoIB interface ib0: it's down
> > LustreError: 6041:0:(o2iblnd.c:2501:kiblnd_startup()) Skipped 1
> > previous similar message
> > eth0: no IPv6 routers present
> > LustreError: 105-4: Error -100 starting up LNI o2ib
> > LustreError: Skipped 1 previous similar message
> > LustreError: 6041:0:(events.c:731:ptlrpc_init_portals()) network
> > initialisation failed
> > LustreError: 158-c: Can't load module 'mgs'
> > LustreError: 6035:0:(genops.c:286:class_newdev()) OBD: unknown type:
> > mgs
> > LustreError: 6035:0:(obd_config.c:300:class_attach()) Cannot create
> > device MGS of type mgs : -19
> > LustreError: 6035:0:(obd_mount.c:502:lustre_start_simple()) MGS
> > attach error -19
> > LustreError: 15e-a: Failed to start MGS 'MGS' (-19). Is the 'mgs'
> > module loaded?
> > LustreError: 6035:0:(obd_mount.c:1492:server_put_super()) no obd
> > lustre-MDTffff
> > LustreError: 6035:0:(obd_mount.c:137:server_deregister_mount())
> > lustre-MDTffff not registered
> > Lustre: server umount lustre-MDTffff complete
> > LustreError: 6035:0:(obd_mount.c:2136:lustre_fill_super()) Unable to
> > mount  (-19)
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sdb1, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sdb1, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > LDISKFS-fs: file extents enabled
> > LDISKFS-fs: mballoc enabled
> > LustreError: 6117:0:(events.c:731:ptlrpc_init_portals()) network
> > initialisation failed
> > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
> > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0
> > breaks, 0 lost
> > LDISKFS-fs: mballoc: 0 generated and it took 0
> > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sdb2, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sdb2, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > LDISKFS-fs: file extents enabled
> > LDISKFS-fs: mballoc enabled
> > LustreError: 6193:0:(events.c:731:ptlrpc_init_portals()) network
> > initialisation failed
> > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
> > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0
> > breaks, 0 lost
> > LDISKFS-fs: mballoc: 0 generated and it took 0
> > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sdb3, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sdb3, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > LDISKFS-fs: file extents enabled
> > LDISKFS-fs: mballoc enabled
> > LustreError: 6269:0:(o2iblnd.c:2501:kiblnd_startup()) Can't query
> > IPoIB interface ib0: it's down
> > LustreError: 6269:0:(o2iblnd.c:2501:kiblnd_startup()) Skipped 2
> > previous similar messages
> > LustreError: 105-4: Error -100 starting up LNI o2ib
> > LustreError: Skipped 2 previous similar messages
> > LustreError: 6269:0:(events.c:731:ptlrpc_init_portals()) network
> > initialisation failed
> > LustreError: 158-c: Can't load module 'mgc'
> > LustreError: Skipped 2 previous similar messages
> > LustreError: 6263:0:(genops.c:286:class_newdev()) OBD: unknown type:
> > mgc
> > LustreError: 6263:0:(genops.c:286:class_newdev()) Skipped 2 previous
> > similar messages
> > LustreError: 6263:0:(obd_config.c:300:class_attach()) Cannot create
> > device MGC0 at lo of type mgc : -19
> > LustreError: 6263:0:(obd_config.c:300:class_attach()) Skipped 2
> > previous similar messages
> > LustreError: 6263:0:(obd_mount.c:502:lustre_start_simple()) MGC0 at lo
> > attach error -19
> > LustreError: 6263:0:(obd_mount.c:502:lustre_start_simple()) Skipped
> > 2 previous similar messages
> > LustreError: 6263:0:(obd_mount.c:1492:server_put_super()) no obd
> > lustre-OST0002
> > LustreError: 6263:0:(obd_mount.c:1492:server_put_super()) Skipped 2
> > previous similar messages
> > LustreError: 6263:0:(obd_mount.c:137:server_deregister_mount())
> > lustre-OST0002 not registered
> > LustreError: 6263:0:(obd_mount.c:137:server_deregister_mount())
> > Skipped 2 previous similar messages
> > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
> > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0
> > breaks, 0 lost
> > LDISKFS-fs: mballoc: 0 generated and it took 0
> > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
> > Lustre: server umount lustre-OST0002 complete
> > Lustre: Skipped 2 previous similar messages
> > LustreError: 6263:0:(obd_mount.c:2136:lustre_fill_super()) Unable to
> > mount  (-19)
> > LustreError: 6263:0:(obd_mount.c:2136:lustre_fill_super()) Skipped 2
> > previous similar messages
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sdb4, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > kjournald starting.  Commit interval 5 seconds
> > LDISKFS-fs warning: maximal mount count reached, running e2fsck is
> > recommended
> > LDISKFS FS on sdb4, internal journal
> > LDISKFS-fs: mounted filesystem with ordered data mode.
> > LDISKFS-fs: file extents enabled
> > LDISKFS-fs: mballoc enabled
> > LustreError: 6345:0:(events.c:731:ptlrpc_init_portals()) network
> > initialisation failed
> > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
> > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0
> > breaks, 0 lost
> > LDISKFS-fs: mballoc: 0 generated and it took 0
> > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
> >
------------------------------------------------------------------------
------------
> >
> > Any ideas?
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 




More information about the lustre-discuss mailing list