[lustre-discuss] Errors when starting Lustre on CentOS 6.5

Andreas Dilger adilger at whamcloud.com
Wed Nov 28 13:54:53 PST 2018


I would strongly suggest to upgrade to something newer than 2.7.0-rc4.

That is 3.5 years old, and you can imagine that some bugs have been fixed since then.  Also, searching in https://jira.whamcloud.com/ shows this bug is already fixed.

Cheers, Andreas

> On Nov 28, 2018, at 07:37, Guillaume Postic <guillaume.postic at univ-paris-diderot.fr> wrote:
> 
> Hello,
> 
> When running 'mount.lustre /dev/sdb /mdt', I got the following errors:
> 
> --------------------------------------------------------------------------------
> Nov 28 10:52:27 localhost kernel: LNet: HW CPU cores: 32, npartitions: 4
> Nov 28 10:52:27 localhost kernel: alg: No test for adler32 (adler32-zlib)
> Nov 28 10:52:27 localhost kernel: alg: No test for crc32 (crc32-table)
> Nov 28 10:52:27 localhost kernel: alg: No test for crc32 (crc32-pclmul)
> Nov 28 10:52:35 localhost kernel: Lustre: Lustre: Build Version:
> 2.7.0-RC4--PRISTINE-2.6.32-504.8.1.el6_lustre.x86_64
> Nov 28 10:52:35 localhost kernel: LNet: Added LNI 10.0.1.60 at tcp
> [8/256/0/180]
> Nov 28 10:52:35 localhost kernel: LNet: Added LNI 172.27.7.38 at tcp1
> [8/256/0/180]
> Nov 28 10:52:35 localhost kernel: LNet: Accept secure, port 988
> Nov 28 10:52:37 localhost kernel: LDISKFS-fs (sdb): recovery complete
> Nov 28 10:52:37 localhost kernel: LDISKFS-fs (sdb): mounted filesystem
> with ordered data mode. quota=on. Opts:
> Nov 28 10:52:47 localhost kernel: Lustre: lustre-MDD0000: changelog on
> Nov 28 10:52:47 localhost kernel: Lustre: lustre-MDT0000: Will be in
> recovery for at least 5:00, or until 112 clients reconnect
> Nov 28 10:52:49 localhost kernel: Lustre: lustre-MDT0000: Client
> 5800a16f-8e18-e4f3-32a0-041e00a27e97 (at 10.0.1.102 at tcp) reconnecting,
> waiting for 112 clients in recovery for 4:57
> Nov 28 10:52:49 localhost kernel: Lustre:
> 8210:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has
> timed out for slow reply: [sent 1543398764/real 1543398764]
> req at ffff88081ef2a080 x1618370892922924/t0(0)
> o8->lustre-OST0002-osc-MDT0000 at 10.0.1.63@tcp:28/4 lens 400/544 e 0 to 1
> dl 1543398769 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
> Nov 28 10:52:49 localhost kernel: LustreError:
> 8355:0:(osd_handler.c:1017:osd_trans_start()) ASSERTION(
> get_current()->journal_info == ((void *)0) ) failed:
> Nov 28 10:52:49 localhost kernel: LustreError:
> 8355:0:(osd_handler.c:1017:osd_trans_start()) LBUG
> Nov 28 10:52:49 localhost kernel: Pid: 8355, comm: mdt03_003
> Nov 28 10:52:49 localhost kernel:
> Nov 28 10:52:49 localhost kernel: Call Trace:
> Nov 28 10:52:49 localhost kernel: [<ffffffffa031b895>]
> libcfs_debug_dumpstack+0x55/0x80 [libcfs]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa031be97>]
> lbug_with_loc+0x47/0xb0 [libcfs]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0be424d>]
> osd_trans_start+0x25d/0x660 [osd_ldiskfs]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0434b4a>]
> llog_osd_destroy+0x42a/0xd40 [obdclass]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa042dedc>]
> llog_cat_new_log+0x1ec/0x710 [obdclass]
> 
> Message from syslogd at localhost at Nov 28 10:52:49 ...
>  kernel:LustreError: 8355:0:(osd_handler.c:1017:osd_trans_start())
> ASSERTION( get_current()->journal_info == ((void *)0) ) failed:
> 
> Message from syslogd at localhost at Nov 28 10:52:49 ...
>  kernel:LustreError: 8355:0:(osd_handler.c:1017:osd_trans_start()) LBUG
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0eab54d>] ?
> lod_xattr_set_internal+0x1bd/0x420 [lod]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa042e50a>]
> llog_cat_add_rec+0x10a/0x450 [obdclass]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa04261e9>]
> llog_add+0x89/0x1c0 [obdclass]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0f084e2>]
> mdd_changelog_store+0x122/0x290 [mdd]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0f08825>]
> mdd_changelog_ns_store+0x1d5/0x610 [mdd]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0f0c2c2>] ?
> mdd_links_rename+0x2f2/0x530 [mdd]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0f0d76a>] ?
> __mdd_index_insert+0x5a/0x160 [mdd]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0f173c8>]
> mdd_create+0x12b8/0x1730 [mdd]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0de1cb8>]
> mdo_create+0x18/0x50 [mdt]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0debe6f>]
> mdt_reint_open+0x1f8f/0x2c70 [mdt]
> Nov 28 10:52:49 localhost kernel: [<ffffffff8109eefc>] ?
> remove_wait_queue+0x3c/0x50
> Nov 28 10:52:49 localhost kernel: [<ffffffffa033883c>] ?
> upcall_cache_get_entry+0x29c/0x880 [libcfs]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0dd30cd>]
> mdt_reint_rec+0x5d/0x200 [mdt]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0db723b>]
> mdt_reint_internal+0x4cb/0x7a0 [mdt]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0db7706>]
> mdt_intent_reint+0x1f6/0x430 [mdt]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa0db5cf4>]
> mdt_intent_policy+0x494/0xce0 [mdt]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa063f4f9>]
> ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc]
> Nov 28 10:52:49 localhost kernel: [<ffffffffa066b46b>]
> ldlm_handle_enqueue0+0x51b/0x13f0 [ptlrpc]
> --------------------------------------------------------------------------------
> 
> Does anyone know how to solve that problem?
> 
> Build version: 2.7.0-RC4--PRISTINE-2.6.32-504.8.1.el6_lustre.x86_64
> 
> Thanks a lot,
> Guillaume Postic
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
---
Andreas Dilger
CTO Whamcloud






More information about the lustre-discuss mailing list