[Lustre-discuss] failover of OSTs: llogs for setup
Erich Focht
efocht at hpce.nec.com
Thu Jan 29 10:46:45 PST 2009
Hello,
we have a problem in a test setup where clients don't recover after a
failover of the OSS. Looking at the llog entries on the MGS I see:
#25 (224)marker 10 (flags=0x01, v1.6.5.1) lustre-OST0001 'add osc' Thu
Nov 6 17:56:23 2008-
#26 (080)add_uuid nid=10.3.0.229 at o2ib(0x500000a0300e5) 0:
1:10.3.0.229 at o2ib
#27 (080)add_uuid nid=192.168.50.129 at tcp(0x20000c0a83281) 0:
1:10.3.0.229 at o2ib
#28 (128)attach 0:lustre-OST0001-osc 1:osc 2:lustre-clilov_UUID
#29 (136)setup 0:lustre-OST0001-osc 1:lustre-OST0001_UUID
2:10.3.0.229 at o2ib
#30 (080)add_uuid nid=10.3.0.229 at o2ib(0x500000a0300e5) 0:
1:10.3.0.229 at o2ib
#31 (080)add_uuid nid=192.168.50.129 at tcp(0x20000c0a83281) 0:
1:10.3.0.229 at o2ib
#32 (104)add_conn 0:lustre-OST0001-osc 1:10.3.0.229 at o2ib
#33 (128)lov_modify_tgts add 0:lustre-clilov 1:lustre-OST0001_UUID 2:1
3:1
#34 (224)marker 10 (flags=0x02, v1.6.5.1) lustre-OST0001 'add osc' Thu
Nov 6 17:56:23 2008-
If I understand this correctly: the client "knows" where to connect for
accessing an OST from these entries. And these just display one of the
two OSSes (10.3.0.229 at o2ib,192.168.50.129 at tcp). It is possible that
there was a mistake when mounting the OST the first time, and it was
mounted on the wrong OSS (the failover node). Would this lead to such an
issue?
Is this correctable by re-registering the OST to the MDS (doing the
"first mount" again)? What do I need to do on the MGS and OST for this
(tunefs...?)?
Thanks & best regards,
Erich
More information about the lustre-discuss
mailing list