[Lustre-discuss] OST: inactive device, but why?

Mon Feb 1 04:32:11 PST 2010

Hi,

I am running lustre in my test setup, and I saw inactive OST device, when I do 
what I describe below:
I run lustre-1.8.1.1 on SLES11 x86_64, on servers and clients.

On the MGS/MDS (10.0.0.81) server, I created one mgs and two mdt partitions:
mkfs.lustre --mgs --mdt --fsname=foo --reformat --mkfsoptions="-N 500000" 
/dev/xvdb1
mkfs.lustre --mdt --fsname=bar --mgsnode=10.0.0.81 at tcp --reformat /dev/xvdb2
mount -t lustre /dev/xvdb1 /lustre/foo-mgs-mdt
mount -t lustre /dev/xvdb2 /lustre/bar-mdt

On the two OSS hosts, I created those storages:
ON OST1 (10.0.0.85)host:
mke2fs -O journal_dev -b 4096 /dev/xvdd
mke2fs -O journal_dev -b 4096 /dev/xvde
mkfs.lustre --fsname=foo --param="failover.node=10.0.0.86 at tcp" --
mgsnode=10.0.0.81 at tcp --ost --reformat --mkfsoptions "-j -J device=/dev/xvdd -
E stride=32" /dev/xvdb1
mkfs.lustre --fsname=bar --param="failover.node=10.0.0.86 at tcp" --
mgsnode=10.0.0.81 at tcp --ost --reformat --mkfsoptions "-j -J device=/dev/xvde -
E stride=32" /dev/xvdb2
ON OST2 (10.0.0.86)host:
mke2fs -O journal_dev -b 4096 /dev/xvdf
mke2fs -O journal_dev -b 4096 /dev/xvdg
mkfs.lustre --fsname=foo --param="failover.node=10.0.0.85 at tcp" --
mgsnode=10.0.0.81 at tcp --ost --reformat --mkfsoptions "-j -J device=/dev/xvdf -
E stride=32" /dev/xvdc1
mkfs.lustre --fsname=bar --param="failover.node=10.0.0.85 at tcp" --
mgsnode=10.0.0.81 at tcp --ost --reformat --mkfsoptions "-j -J device=/dev/xvdg -
E stride=32" /dev/xvdc2

The devices are shared storage via SAN. I configured pacemaker, using the 
lustre resource script, to mount and unmount the partitions on the OST hosts. 
I had both OST hosts in standby mode. I put OST2 (10.0.0.86) in active mode. 
Then all four partitions got successfully mounted on OST2. Then I put OST1 in 
active mode too. The shares that belong to OST1, were automatically unmounted 
from OST2, and successfully mounted on OST1. That happened due to configured 
constraints in pacemaker.

Then I mounted the client filesystems:
mount -t lustre 10.0.0.81 at tcp:/foo /lustre/foo
mount -t lustre 10.0.0.81 at tcp:/bar /lustre/bar

LUSTRE-CLIENT:/lustre # lfs df -h
UUID                     bytes      Used Available  Use% Mounted on
foo-MDT0000_UUID        265.5M     19.1M    220.9M    7% foo[MDT:0]
foo-OST0000_UUID    : inactive device
foo-OST0001_UUID          9.8G     22.7M      9.3G    0% foo[OST:1]

filesystem summary:       9.8G     22.7M      9.3G    0% foo

LUSTRE-CLIENT:/lustre # cat /proc/fs/lustre/osc/foo-OST0000-osc-
ffff88003e695800/ost_conn_uuid                                                                                                                         
10.0.0.86 at tcp                                                                                                                                                                                                          
LUSTRE-CLIENT:/lustre # cat /proc/fs/lustre/osc/foo-OST0001-osc-
ffff88003e695800/ost_conn_uuid 
10.0.0.86 at tcp

And then I got that foo-OST0000_UUID    : inactive device
I unmounted the filesystem from the client, and stopped the OSTs, and MDT/MGS, 
and remounted everything, starting with the MGS, MDT, OSTs and then on the 
client. However, I still have the inactive device.

Therefore I reformatted everything on the servers like above. Then mounting 
the MGS/MDTs and then the OSTs. The only difference was how I started the 
OSTs:
Instead of taking them from standby mode online, both were already online, and 
I only started the lustre resource one after each other. Then the lustre 
filesystems got mounted on the server where they are intended to run.

LUSTRE-CLIENT:/lustre # lfs df -h
UUID                     bytes      Used Available  Use% Mounted on
foo-MDT0000_UUID        265.5M     19.1M    220.9M    7% foo[MDT:0]
foo-OST0000_UUID          9.8G     22.7M      9.3G    0% foo[OST:0]
foo-OST0001_UUID          9.8G     22.7M      9.3G    0% foo[OST:1]

filesystem summary:      19.7G     45.4M     18.6G    0% foo

LUSTRE-CLIENT:~ # cat /proc/fs/lustre/osc/foo-OST0000-osc-
ffff88003e5ca000/ost_conn_uuid
10.0.0.85 at tcp
LUSTRE-CLIENT:~ # cat /proc/fs/lustre/osc/foo-OST0001-osc-
ffff88003e5ca000/ost_conn_uuid
10.0.0.86 at tcp
LUSTRE-CLIENT:~ #

Afterwards, I can also put one of the OST servers into standby mode, and put 
it back online, without problem, e.g. after putting OST1 into standby mode:
LUSTRE-CLIENT:/lustre/foo # lfs df -h
UUID                     bytes      Used Available  Use% Mounted on
foo-MDT0000_UUID        265.5M     19.1M    220.9M    7% /lustre/foo[MDT:0]
foo-OST0000_UUID          9.8G     22.7M      9.3G    0% /lustre/foo[OST:0]
foo-OST0001_UUID          9.8G     22.7M      9.3G    0% /lustre/foo[OST:1]

filesystem summary:      19.7G     45.4M     18.6G    0% /lustre/foo

LUSTRE-CLIENT:/lustre/foo # cat /proc/fs/lustre/osc/foo-OST0001-osc-
ffff88003e5ca000/ost_conn_uuid
10.0.0.86 at tcp
LUSTRE-CLIENT:/lustre/foo # cat /proc/fs/lustre/osc/foo-OST0000-osc-
ffff88003e5ca000/ost_conn_uuid
10.0.0.86 at tcp

and taking it back online:
LUSTRE-CLIENT:/lustre/foo # lfs df -h
UUID                     bytes      Used Available  Use% Mounted on
foo-MDT0000_UUID        265.5M     19.1M    220.9M    7% /lustre/foo[MDT:0]
foo-OST0001_UUID          9.8G     22.7M      9.3G    0% /lustre/foo[OST:1]

filesystem summary:       9.8G     22.7M      9.3G    0% /lustre/foo

LUSTRE-CLIENT:/lustre/foo # cat /proc/fs/lustre/osc/foo-OST0001-osc-
ffff88003e5ca000/ost_conn_uuid
10.0.0.86 at tcp
LUSTRE-CLIENT:/lustre/foo # cat /proc/fs/lustre/osc/foo-OST0000-osc-
ffff88003e5ca000/ost_conn_uuid
10.0.0.85 at tcp

Therefore, some questions came up:
Is this expected behaviour when the OST filesystems are mounted the first 
time, but unfortunately on the wrong server?
How can I make the inactive OST active, when such things happened, without 
reformatting everything?
Is there a way how I can prevent this from happening accidentally? 

regards,
Sebastian