[Lustre-discuss] Replace OST

Jon Tegner tegner at foi.se
Thu Mar 20 02:56:17 PDT 2014


Thanks a lot!

Much appreciated!

I'm using 2.4.2 on CentOS-6.5. Indeed it seems related to the issue 
Andreas links to. I did some further tests, and it seems that if I run 
the "find" command (or something like "ls -R /lusterMountPoint") BEFORE 
I unmount the OST the "find" command works after the OST has been 
unmounted. Related to some kind of caching?

Tried to use "getstripe" instead (thanks!), with the command:

"lfs getstripe --ost lustre-OST0001 -v -r /home | grep home | xargs -n 1 
unlink"

(where /home is my luster mount point).

By then following the procedure as indicated in the manual (thanks for 
pointing me to the latest!) I managed to introduce the "new" (in my case 
"reformated") OST in the file system.

At the moment I'm just testing, and the amount of data on the system is 
very small, and I have no clue whether the way I combine "getstripe" 
with "xargs" an "unlink" is an efficient way of doing it?

Another thing I didn't understand was that after "reformating" the OST, 
and mounting it, "df" indicated that the file system was growing. Before 
mounting the "failed" OST it seemed as if about half of the space was 
gone (reasonable, since I'm playing around with two OSTs). However, 
after a while it seemed as if the file system was automagically 
repopulated. But not with ALL of the files (when compared with my 
"backup").  But maybe this could be attributed to confusion on my part...

Anyway, thanks again for your help (realizing that I can get feedback 
from this list will make it less scary to move over to lustre)!

Regards,

/jon

On 03/19/2014 08:51 PM, Dilger, Andreas wrote:
> On 2014/03/19, 9:35 AM, "Lee, Brett" <brett.lee at intel.com> wrote:
>
>> Hi Jon,
>>
>> Looks like you are jumping in with both feet (new to Lustre and trying to
>> replace an OST).  Pretty ambitious... ;)
>>
> >From what I can tell you are following section 14.8.3 in the latest
>> Lustre manual:
>>
>> 14.8.3.  Removing an OST from the File System
>> http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/
>> lustre_manual.xhtml
>>
>> I'm not familiar with the process (having not needed to do that), but I
>> believe that step describes how to remove an *active* OST from the file
>> system.
>>
>> Here's what I'd suggest:
>>
>> 1.  Verify that each step in the *latest* manual has been executed
>> successfully.
>>
>> 2.  If the "lfs find" process has not completed yet, run "strace -p
>> <pid>" on the parent process to provide the list more detail.
> Looks like https://jira.hpdd.intel.com/browse/LU-1738 which hasn't been
> fixed yet.  In the meantime, you can use "lfs getstripe" instead.
>
>> 3.  Provide the list which version of Lustre are you running.
>>
>> 4.  Provide the list relevant syslog messages from the client, MDS, MGS,
>> and the OSS with the deactivated OST.
>>
>> Note that for replacing a *fully* failed OST, I believe you would
>> reformat the OST and then follow 14.8.5 (in the latest manual).
>>
>> Dr. Brett Lee, Solutions Architect
>> High Performance Data Division, Intel
>> +1.303.625.3595
>>
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
>>> bounces at lists.lustre.org] On Behalf Of Jon Tegner
>>> Sent: Monday, March 17, 2014 10:28 AM
>>> To: lustre-discuss at lists.lustre.org
>>> Subject: [Lustre-discuss] Replace OST
>>>
>>> Hi,
>>>
>>> I'm new to lustre, so please excuse me for probably some stupid
>>> questions.
>>>
>>> I have set up a small test system, consisting of
>>>
>>> * 1 MGS/MDT
>>> * 2 OSS/OSTs
>>> * 6 clients on infiniband and one on gigabit.
>>>
>>> I have verified the scaling effect (increased performance with two OSTs
>>> compared to one). I further wanted to gain some experience when
>>> components fail, and in a first test I wanted to replace one of my
>>> OSS/OSTs.
>>> Did the following (trying to follow chapter 14.7 in the manual):
>>>
>>> 1. Unmounted the OST on one of my two OSS/OSTs (simulating a crash).
>>>
>>> 2. Deactivating this OST on the MGS/MDT.
>>>
>>> 3. Trying to remove the files located on this OST with the command:
>>>
>>> lfs find --obd lustre-OST0002 -print0 /home |  tee /tmp/files_to_restore
>>> | xargs -0 -n 1 unlink
>>>
>>> The last command was issued from one of the clients, but it just hangs.
>>>
>>> Are there something wrong with the way I'm trying to do this? Any help
>>> would be greatly appreciated!
>>>
>>> Thanks!
>>>
>>> /jon
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
> Cheers, Andreas




More information about the lustre-discuss mailing list