[Lustre-discuss] Replace OST

Lee, Brett brett.lee at intel.com
Wed Mar 19 08:35:36 PDT 2014


Hi Jon,

Looks like you are jumping in with both feet (new to Lustre and trying to replace an OST).  Pretty ambitious... ;)

>From what I can tell you are following section 14.8.3 in the latest Lustre manual:

14.8.3.  Removing an OST from the File System
http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml

I'm not familiar with the process (having not needed to do that), but I believe that step describes how to remove an *active* OST from the file system.

Here's what I'd suggest:

1.  Verify that each step in the *latest* manual has been executed successfully.

2.  If the "lfs find" process has not completed yet, run "strace -p <pid>" on the parent process to provide the list more detail.  

3.  Provide the list which version of Lustre are you running.

4.  Provide the list relevant syslog messages from the client, MDS, MGS, and the OSS with the deactivated OST.

Note that for replacing a *fully* failed OST, I believe you would reformat the OST and then follow 14.8.5 (in the latest manual).

Dr. Brett Lee, Solutions Architect
High Performance Data Division, Intel 
+1.303.625.3595





> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
> bounces at lists.lustre.org] On Behalf Of Jon Tegner
> Sent: Monday, March 17, 2014 10:28 AM
> To: lustre-discuss at lists.lustre.org
> Subject: [Lustre-discuss] Replace OST
> 
> Hi,
> 
> I'm new to lustre, so please excuse me for probably some stupid questions.
> 
> I have set up a small test system, consisting of
> 
> * 1 MGS/MDT
> * 2 OSS/OSTs
> * 6 clients on infiniband and one on gigabit.
> 
> I have verified the scaling effect (increased performance with two OSTs
> compared to one). I further wanted to gain some experience when
> components fail, and in a first test I wanted to replace one of my OSS/OSTs.
> Did the following (trying to follow chapter 14.7 in the manual):
> 
> 1. Unmounted the OST on one of my two OSS/OSTs (simulating a crash).
> 
> 2. Deactivating this OST on the MGS/MDT.
> 
> 3. Trying to remove the files located on this OST with the command:
> 
> lfs find --obd lustre-OST0002 -print0 /home |  tee /tmp/files_to_restore
> | xargs -0 -n 1 unlink
> 
> The last command was issued from one of the clients, but it just hangs.
> 
> Are there something wrong with the way I'm trying to do this? Any help
> would be greatly appreciated!
> 
> Thanks!
> 
> /jon
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss



More information about the lustre-discuss mailing list