[Lustre-discuss] an MDT back-up clarification

Ms. Megan Larko dobsonunit at gmail.com
Tue Mar 24 13:01:48 PDT 2009


Hello,

Following the procedure outlined in the Lustre Manual Chapter 15 on
backup and restore, I have tried two approaches to backing up a single
MDT disk.  One was the getfattr and then tar -cvf to get the data and
restore it via the tar -xvf and setfattr method.   The other method
was the rsync -aXS of a mounted MDT to a recipient disk.    I tried
each method both keeping the OBJECTS/* and CATALOG and removing the
aforementioned files.    When I try to use the recipient disk I am
able to mount it but not actually use it.    The error from the rsync
-aXS method is included below but the other errors were similar.    Am
I receiving the errors because I am not going to the OSS machine and
for each OST running the tunefs.lustre --erase-param {params}
--writeconf /dev/sdX command to clear things on the OSTs to accept use
of the new MDT disk?

I found this information on Lustre Discuss Archive under the title
"problem moving mdt to a new node".  Ch. 15 does not show/indicate
performing a tunefs.lustre command on the OSTs as part of an MDT
restore procedure.

Thanks,
megan

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
23 March 2009

This is the LBUG I got every time I mounted the restored LVM disk for
crew8-MDT0000.
The initial mount command returned the CLI prompt quickly but then the error msg
started piling up in messages.

>From /var/log/messages:
Mar 20 10:53:57 mds1 kernel: kjournald starting.  Commit interval 5 seconds
Mar 20 10:53:57 mds1 kernel: LDISKFS FS on dm-1, internal journal
Mar 20 10:53:57 mds1 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Mar 20 10:53:57 mds1 kernel: kjournald starting.  Commit interval 5 seconds
Mar 20 10:53:57 mds1 kernel: LDISKFS FS on dm-1, internal journal
Mar 20 10:53:57 mds1 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode.
Mar 20 10:53:57 mds1 kernel: Lustre: Enabling user_xattr
Mar 20 10:53:57 mds1 kernel: Lustre: MDT crew8-MDT0000 now serving dev
(f8a0e9b5-c2f1-8297
-4ead-e34c9680b3cf) with recovery enabled
Mar 20 10:53:57 mds1 kernel: Lustre: Server crew8-MDT0000 on device
/dev/METADATA2/LV2 has
 started
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(llog_lvfs.c:597:llog_lvfs_create()) err
or looking up logfile 0xa65662:0x9c30d2f6: rc -2
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(osc_request.c:3446:osc_llog_init()) fai
led LLOG_MDS_OST_ORIG_CTXT
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(osc_request.c:3457:osc_llog_init()) osc
 'crew8-OST0000-osc' tgt 'crew8-MDT0000' cnt 1 catid ffffc2000510c000 rc=-2
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(osc_request.c:3459:osc_llog_init()) log
id 0xa65662:0x9c30d2f6
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(lov_log.c:214:lov_llog_init()) error os
c_llog_init idx 0 osc 'crew8-OST0000-osc' tgt 'crew8-MDT0000' (rc=-2)
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(mds_log.c:207:mds_llog_init()) lov_llog
_init err -2
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(llog_obd.c:392:llog_cat_initialize()) r
c: -2
Mar 20 10:53:57 mds1 kernel: LustreError:
23341:0:(lustre_log.h:316:llog_get_context()) AS
SERTION(atomic_read(&ctxt_->loc_refcount) > 0) failed
Mar 20 10:53:57 mds1 kernel: LustreError:
23341:0:(tracefile.c:431:libcfs_assertion_failed
()) LBUG
Mar 20 10:53:57 mds1 kernel: Lustre:
23341:0:(linux-debug.c:168:libcfs_debug_dumpstack())
showing stack for process 23341
Mar 20 10:53:57 mds1 kernel: ll_sync_01    R  running task       0
23341      1         23
343 23339 (L-TLB)
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(osc_request.c:3457:osc_llog_init()) osc 'crew8-OST0000-osc'
tgt 'crew8-MDT0000' cnt 1 catid ffffc2000510c000 rc=-2
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(osc_request.c:3459:osc_llog_init()) logid 0xa65662:0x9c30d2f6
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(lov_log.c:214:lov_llog_init()) error osc_llog_init idx 0 osc
'crew8-OST0000-osc' tgt 'crew8-MDT0000' (rc=-2)
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(mds_log.c:207:mds_llog_init()) lov_llog_init err -2
Mar 20 10:53:57 mds1 kernel: LustreError:
23339:0:(llog_obd.c:392:llog_cat_initialize()) rc: -2
Mar 20 10:53:57 mds1 kernel: LustreError:
23341:0:(lustre_log.h:316:llog_get_context())
ASSERTION(atomic_read(&ctxt_->loc_refcount) > 0) failed
Mar 20 10:53:57 mds1 kernel: LustreError:
23341:0:(tracefile.c:431:libcfs_assertion_failed()) LBUG
Mar 20 10:53:57 mds1 kernel: Lustre:
23341:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for
process 23341
Mar 20 10:53:57 mds1 kernel: ll_sync_01    R  running task       0
23341      1         23343 23339 (L-TLB)
Mar 20 10:53:57 mds1 kernel:  0000000000000001 ffffffff800c76d5
80000000c0ffeeaa 0000000000000180
Mar 20 10:53:57 mds1 kernel:  ffff810004d1bb48 0000000000000000
000000000000000c ffff810025176178
Mar 20 10:53:57 mds1 kernel:  0000000000000180 ffffc2000510c000
ffff81004ff35d20 ffffffff883a7dc5
Mar 20 10:53:57 mds1 kernel: Call Trace:
Mar 20 10:53:57 mds1 kernel:  [<ffffffff800c76d5>]
__vmalloc_area_node+0x12b/0x153
Mar 20 10:53:57 mds1 kernel:  [<ffffffff883a7dc5>]
:obdclass:llog_cat_initialize+0x3b5/0x670
Mar 20 10:53:57 mds1 kernel:  [<ffffffff88834d57>] :lov:lov_get_info+0xa57/0xb20
Mar 20 10:53:57 mds1 kernel:  [<ffffffff887b162a>]
:mds:mds_lov_update_desc+0xc3a/0xe20
Mar 20 10:53:57 mds1 kernel:  [<ffffffff887b1cee>]
:mds:__mds_lov_synchronize+0x4de/0x2060
Mar 20 10:53:57 mds1 kernel:  [<ffffffff8000cead>] dput+0x23/0x10a
Mar 20 10:53:57 mds1 kernel:  [<ffffffff887b4858>]
:mds:mds_lov_synchronize+0x38/0xb0
Mar 20 10:53:57 mds1 kernel:  [<ffffffff8000cead>] dput+0x23/0x10a
Mar 20 10:53:57 mds1 kernel:  [<ffffffff887b4858>]
:mds:mds_lov_synchronize+0x38/0xb0
Mar 20 10:53:57 mds1 kernel:  [<ffffffff800b296c>]
audit_syscall_exit+0x2fb/0x319
Mar 20 10:53:57 mds1 kernel:  [<ffffffff8005bfb1>] child_rip+0xa/0x11
Mar 20 10:53:57 mds1 kernel:  [<ffffffff887b4820>]
:mds:mds_lov_synchronize+0x0/0xb0
Mar 20 10:53:57 mds1 kernel:  [<ffffffff8005bfa7>] child_rip+0x0/0x11
Mar 20 10:53:57 mds1 kernel:
Mar 20 10:53:57 mds1 kernel: LustreError: dumping log to
/tmp/lustre-log.1237560837.23341
Mar 20 10:54:07 mds1 kernel: BUG: soft lockup detected on CPU#0!
Mar 20 10:54:07 mds1 kernel:
Mar 20 10:54:07 mds1 kernel: Call Trace:
Mar 20 10:54:07 mds1 kernel:  <IRQ>  [<ffffffff800b4f75>]
softlockup_tick+0xdb/0xed
Mar 20 10:54:07 mds1 kernel:  [<ffffffff8009306a>]
update_process_times+0x42/0x68
Mar 20 10:54:07 mds1 kernel:  [<ffffffff8007464a>]
smp_local_timer_interrupt+0x2c/0x61
Mar 20 10:54:07 mds1 kernel:  [<ffffffff80074d12>]
smp_apic_timer_interrupt+0x41/0x47
Mar 20 10:54:07 mds1 kernel:  [<ffffffff8005bc8e>]
apic_timer_interrupt+0x66/0x6c
Mar 20 10:54:07 mds1 kernel:  <EOI>  [<ffffffff88874f10>]
:osc:osc_setinfo_mds_conn_interpret+0x0/0x3f0
Mar 20 10:54:07 mds1 kernel:  [<ffffffff80062b1c>] .text.lock.spinlock+0x2/0x30
Mar 20 10:54:07 mds1 kernel:  [<ffffffff88874ff4>]
:osc:osc_setinfo_mds_conn_interpret+0xe4/0x3f0
Mar 20 10:54:07 mds1 kernel:  [<ffffffff886b136a>]
:ptlrpc:ptlrpc_check_set+0x9aa/0xb60
Mar 20 10:54:07 mds1 kernel:  [<ffffffff80048b8a>]
try_to_del_timer_sync+0x51/0x5a
Mar 20 10:54:07 mds1 kernel:  [<ffffffff886b3fea>]
:ptlrpc:ptlrpc_set_wait+0x36a/0x520
Mar 20 10:54:07 mds1 kernel:  [<ffffffff88316fe8>] :libcfs:cfs_alloc+0x28/0x60
Mar 20 10:54:07 mds1 kernel:  [<ffffffff80088431>] default_wake_function+0x0/0xe
Mar 20 10:54:07 mds1 kernel:  [<ffffffff8882869e>]
:lov:lov_set_info_async+0x5ae/0x660
Mar 20 10:54:07 mds1 kernel:  [<ffffffff887b2927>]
:mds:__mds_lov_synchronize+0x1117/0x2060
Mar 20 10:54:07 mds1 kernel:  [<ffffffff8000cead>] dput+0x23/0x10a
Mar 20 10:54:07 mds1 kernel:  [<ffffffff887b4858>]
:mds:mds_lov_synchronize+0x38/0xb0
Mar 20 10:54:07 mds1 kernel:  [<ffffffff800b296c>]
audit_syscall_exit+0x2fb/0x319
Mar 20 10:54:07 mds1 kernel:  [<ffffffff8005bfb1>] child_rip+0xa/0x11
Mar 20 10:54:07 mds1 kernel:  [<ffffffff887b4820>]
:mds:mds_lov_synchronize+0x0/0xb0
Mar 20 10:54:07 mds1 kernel:  [<ffffffff8005bfa7>] child_rip+0x0/0x11
Mar 20 10:54:07 mds1 kernel:
Mar 20 11:02:01 mds1 kernel: Lustre: Failing over crew8-MDT0000
Mar 20 11:02:01 mds1 kernel: Lustre: Skipped 13 previous similar messages



More information about the lustre-discuss mailing list