[Lustre-discuss] Disappearing OSTs

jrs botemout at gmail.com
Mon May 5 13:05:06 PDT 2008


Well, things have changed again as I'm trying to get back to something
that works but on one of the MDSs you see.  Below that I have
the output of 'multipath -l'.  I dual port HBAs and multiple paths
to the backend storage so it looks a little complex.  I've modified
the /etc/multipathd.conf file to give the logical names you see, e.g.,
ost_lustre03-04_04_oss01_dm_7_mds01.

Even though it looks a little scary remember that things work fine and
can even survive a random number of reboots before an OST disappears.
Since the last time I posted I had an MST go away too.

Does anyone think that I might have better luck running Redhat?
I've looked through the /etc/init.d/* files but can't see anything
that might be destroying the partition.

Thanks
John


$ cat /proc/partition
major minor  #blocks  name

  104     0   71652960 cciss/c0d0
  104     1    2104483 cciss/c0d0p1
  104     2   69545385 cciss/c0d0p2
    8     0 5860157184 sda
    8    16 5860157184 sdb
    8    32 5860230912 sdc
    8    48 5860156250 sdd
    8    64 5860156250 sde
    8    80 5860156250 sdf
    8    96 5860156250 sdg
    8   112 5860156250 sdh
    8   128 5860156250 sdi
    8   144 5860157184 sdj
    8   160 5860157184 sdk
    8   176 5860230912 sdl
    8   192 5860156250 sdm
    8   193 5860156216 sdm1
    8   208 5860156250 sdn
    8   224 5860156250 sdo
    8   240 5860157184 sdp
   65     0 5860157184 sdq
   65    16 5860230912 sdr
   65    32 5860156250 sds
   65    33 5860156216 sds1
   65    48 5860156250 sdt
   65    64 5860156250 sdu
   65    80 5860157184 sdv
   65    96 5860157184 sdw
   65   112 5860230912 sdx
   65   128 5860156250 sdy
   65   129 5860156216 sdy1
   65   144 5860156250 sdz
   65   160 5860156250 sdaa
   65   176 5860157184 sdab
   65   192 5860157184 sdac
   65   208 5860230912 sdad
   65   224 5860156250 sdae
   65   225 5860156216 sdae1
   65   240 5860156250 sdaf
   66     0 5860156250 sdag
   66    16 5860157184 sdah
   66    32 5860157184 sdai
   66    48 5860230912 sdaj
   66    64 5860157184 sdak
   66    80 5860157184 sdal
   66    96 5860230912 sdam
   66   112 5860156250 sdan
   66   128 5860156250 sdao
   66   144 5860156250 sdap
   66   160 5860156250 sdaq
   66   176 5860156250 sdar
   66   192 5860156250 sdas
   66   208 5860157184 sdat
   66   224 5860157184 sdau
   66   240 5860230912 sdav
  253     0 5860156250 dm-0
  253     1 5860157184 dm-1
  253     2 5860157184 dm-2
  253     3 5860230912 dm-3
  253     4 5860156250 dm-4
  253     5 5860156250 dm-5
  253     6 5860157184 dm-6
  253     7 5860157184 dm-7
  253     8 5860230912 dm-8
  253     9 5860156250 dm-9
  253    10 5860156250 dm-10
  253    11 5860156250 dm-11
  253    12 5860156216 dm-12


$ multipath -l
ost_lustre03-04_04_oss01_dm_7_mds01 (36000402001fc308260c0ace100000000) dm-7 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:3:4 sdai 66:32  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:7:4 sdau 66:224 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:3:4 sdk  8:160  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:7:4 sdw  65:96  [active][undef]
ost_lustre01-02_04_oss01_dm_5_mds01 (36000402001fc14596ef496fd00000000) dm-5 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:2:1 sdaf 65:240 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:4:1 sdn  8:208  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:6:1 sdt  65:48  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:0:1 sdz  65:144 [active][undef]
ost_lustre03-04_02_oss01_dm_3_mds01 (36000402001fc308260c0af3700000000) dm-3 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:1:2 sdad 65:208 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:4:2 sdam 66:96  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:0:2 sdc  8:32   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:5:2 sdr  65:16  [active][undef]
ost_lustre01-02_02_oss01_dm_11_mds01 (36000402001fc14596ef497ee00000000) dm-11 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:5:5 sdap 66:144 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:6:5 sdas 66:192 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:1:5 sdf  8:80   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:2:5 sdi  8:128  [active][undef]
ost_lustre01-02_05_oss02_dm_0_mds01 (36000402001fc14596ef4970e00000000) dm-0 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:0:2 sdaa 65:160 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:2:2 sdag 66:0   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:4:2 sdo  8:224  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:6:2 sdu  65:64  [active][undef]
ost_lustre01-02_01_oss02_dm_10_mds01 (36000402001fc14596ef497dc00000000) dm-10 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:5:4 sdao 66:128 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:6:4 sdar 66:176 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:1:4 sde  8:64   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:2:4 sdh  8:112  [active][undef]
mdt_lustre03-04_00_dm_8_mds01 (36000402001fc308260c0ac9e00000000) dm-8 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:3:5 sdaj 66:48  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:7:5 sdav 66:240 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:3:5 sdl  8:176  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:7:5 sdx  65:112 [active][undef]
ost_lustre03-04_03_oss02_dm_6_mds01 (36000402001fc308260c0acc200000000) dm-6 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:3:3 sdah 66:16  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:7:3 sdat 66:208 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:3:3 sdj  8:144  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:7:3 sdv  65:80  [active][undef]
ost_lustre01-02_03_oss02_dm_4_mds01 (36000402001fc14596ef496ed00000000) dm-4 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:2:0 sdae 65:224 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:4:0 sdm  8:192  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:6:0 sds  65:32  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:0:0 sdy  65:128 [active][undef]
ost_lustre03-04_01_oss02_dm_2_mds01 (36000402001fc308260c0af1600000000) dm-2 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:1:1 sdac 65:192 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:4:1 sdal 66:80  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:0:1 sdb  8:16   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:5:1 sdq  65:0   [active][undef]
ost_lustre01-02_00_oss01_dm_9_mds01 (36000402001fc14596ef497cc00000000) dm-9 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:5:3 sdan 66:112 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:6:3 sdaq 66:160 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:1:3 sdd  8:48   [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:2:3 sdg  8:96   [active][undef]
ost_lustre03-04_00_oss01_dm_1_mds01 (36000402001fc308260c0af5b00000000) dm-1 NEXSAN,SATABeast
[size=5.5T][features=0][hwhandler=0]
\_ round-robin 0 [prio=0][active]
  \_ 1:0:1:0 sdab 65:176 [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 1:0:4:0 sdak 66:64  [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:0:0 sda  8:0    [active][undef]
\_ round-robin 0 [prio=0][enabled]
  \_ 0:0:5:0 sdp  8:240  [active][undef]



Bernd Schubert wrote:
 > On Mon, May 05, 2008 at 12:30:23PM -0400, jrs wrote:
 >> I wonder if I'd have better luck, with the disappearing OST bug, if
 >> I actually explictly partitioned the device and then used, to take
 >> the example above
 >>
 >>     /dev/mapper/ost_oss01_lustre0304_02-part1
 >>
 >> rather than the whole disk.
 >>
 >
 > What does /proc/partitions say?



More information about the lustre-discuss mailing list