[Lustre-discuss] Recovery Problem

Andreas Dilger andreas.dilger at oracle.com
Fri May 21 07:56:01 PDT 2010


On 2010-05-21, at 5:49, Stefano Elmopi <stefano.elmopi at sociale.it>  
wrote:
>
> I realized that the time server differed much across machines,
> there were at least a few hours of difference.
> I'm doing the tests and have not been paying attention to time  
> synchronization
> but now I have aligned the time of all servers and I've configured  
> ntpd service
> and the problem no longer occurs.
> I can imagine that the cause of the problem was just the time  
> misalignment.

The client and server clock should have nothing to do with the  
functioning of lustre, so it surprising that this would be the cause.


> Il giorno 20/mag/10, alle ore 13:28, Johann Lombardi ha scritto:
>
>> On Thu, May 20, 2010 at 12:29:41PM +0200, Stefano Elmopi wrote:
>>> Hi Andreas
>>> My version of Lustre 1.8.3
>>> Sorry for my bad English but I used the wrong word, "crash" is not  
>>> the
>>> right word.
>>> I try to explain better, I start copying a large file on the file  
>>> system
>>> and while the copy process continues, I reboot the server OSS,
>>> and the copy process enters state "- stalled -".
>>> I expected that once the server back online, the copy process to  
>>> resume
>>> normal
>>> and complete copy of the file, instead the copy process fault.
>>> Therefore the copy process that goes wrong, Lustre continues to  
>>> perform
>>> good.
>>
>> May 19 13:46:31 mdt01prdpom kernel: LustreError: 167-0: This client  
>> was
>> evicted by lustre01-OST0000; in progress operations using this  
>> service
>> will fail.
>>
>> The cp process failed because the client got evicted by the OSS.
>> We need to look at the OSS logs to figure out the root cause of
>> the eviction.
>>
>> Johann
>



More information about the lustre-discuss mailing list