[Lustre-discuss] fstab mount fails often

Tue Nov 16 08:11:45 PST 2010

Hi,

> So I was incorrect to assume you are mounting this for the first time? 
> Sounded like other clients have successfully mounted and have written
> data to the OSTs? Status -113 = no route to host so you might want
> to check your "attempt mount" client connectivity to both the MDS/MGS
> and most importantly to the OSS.

Yes, sorry if this wasn't clear before. I am using this system in
production since 8 months (~40 users, ~600GB of data). The whole thing
comes spinning down from time to time, most probably though flaky
networking. I can't really predict when it happens, or what triggers
it. 

The mount problem I described here is always present and seems to be
connected to the main problem. Random nodes can't mount on the first try
(even if it worked before on this node). Retrying works usually.

> I have encountered the exact same error prior when my mgsnode=xxxx at o2ib0,yyyy at o2ib1 but missing the zzzz at tcp0 on the OSTs. I tried using "tunefs.lustre --erase-param --mgsnode=....." to avoid reformatting it but at the end decided to wipe it out and start from scratch. My client only use tcp to MDS/MGS/OSS via two separate networks so once I append the missing piece on the mgsnode= parameter it mounted immediately.

This is what I did according to my notes:

        mds 
        	mkfs.lustre --fsname=lustre --mgs --mdt /dev/sda6
        	mount -t lustre /dev/sda6 /mdt0
        oss0
        	mkfs.lustre --ost --fsname=lustre --mgsnode=10.1.1.1 at tcp0 /dev/sda3
        	mkfs.lustre --ost --fsname=lustre --mgsnode=10.1.1.1 at tcp0 /dev/sdb3
        	mount -t lustre /dev/sda3 /ost0
        	mount -t lustre /dev/sdb3 /ost1
        oss1
        	mkfs.lustre --ost --fsname=lustre --mgsnode=10.1.1.1 at tcp0 /dev/sda3
        	mkfs.lustre --ost --fsname=lustre --mgsnode=10.1.1.1 at tcp0 /dev/sdb3
        	mount -t lustre /dev/sda3 /ost0
        	mount -t lustre /dev/sdb3 /ost1
        client
        	mount -t lustre 10.1.1.1 at tcp0:/lustre /lustre

I can try to rewrite the parameters on the file systems and power-cycle
the whole cluster, but I guess I have to wait for quieter times for
this. Reformatting the OSTs would be a major hassle, as I need to back
up all the data before.

Arne