[Lustre-discuss] failover of OSTs: llogs for setup

Erich Focht efocht at hpce.nec.com
Thu Jan 29 10:46:45 PST 2009


Hello,

we have a problem in a test setup where clients don't recover after a 
failover of the OSS. Looking at the llog entries on the MGS I see:

#25 (224)marker  10 (flags=0x01, v1.6.5.1) lustre-OST0001  'add osc' Thu 
Nov  6 17:56:23 2008-
#26 (080)add_uuid  nid=10.3.0.229 at o2ib(0x500000a0300e5)  0: 
1:10.3.0.229 at o2ib
#27 (080)add_uuid  nid=192.168.50.129 at tcp(0x20000c0a83281)  0: 
1:10.3.0.229 at o2ib
#28 (128)attach    0:lustre-OST0001-osc  1:osc  2:lustre-clilov_UUID
#29 (136)setup     0:lustre-OST0001-osc  1:lustre-OST0001_UUID 
2:10.3.0.229 at o2ib
#30 (080)add_uuid  nid=10.3.0.229 at o2ib(0x500000a0300e5)  0: 
1:10.3.0.229 at o2ib
#31 (080)add_uuid  nid=192.168.50.129 at tcp(0x20000c0a83281)  0: 
1:10.3.0.229 at o2ib
#32 (104)add_conn  0:lustre-OST0001-osc  1:10.3.0.229 at o2ib
#33 (128)lov_modify_tgts add 0:lustre-clilov  1:lustre-OST0001_UUID  2:1 
  3:1
#34 (224)marker  10 (flags=0x02, v1.6.5.1) lustre-OST0001  'add osc' Thu 
Nov  6 17:56:23 2008-


If I understand this correctly: the client "knows" where to connect for 
accessing an OST from these entries. And these just display one of the 
two OSSes (10.3.0.229 at o2ib,192.168.50.129 at tcp). It is possible that 
there was a mistake when mounting the OST the first time, and it was 
mounted on the wrong OSS (the failover node). Would this lead to such an 
issue?

Is this correctable by re-registering the OST to the MDS (doing the 
"first mount" again)? What do I need to do on the MGS and OST for this 
(tunefs...?)?

Thanks & best regards,
Erich



More information about the lustre-discuss mailing list