[Lustre-discuss] Recovery without end

Wed Feb 25 08:26:09 PST 2009

We used to do something similar, and still had issues,

Upgrading all servers (2 OSS's 7 OSTs each) and clients (800)  to  
1.6.6 fixed all our issues, we run default timeout's and default  
everything really, no issues.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985

On Feb 25, 2009, at 11:22 AM, Charles Taylor wrote:

> I'm going to pipe in here.    We too use a very large (1000) timeout
> value.   We have two separate luster file systems one of them consists
> of two rather beefy OSSs with 12 OSTs each (FalconIII FC-SATA RAID).
> The other consists of 8 OSSs with 3 OSTs each (Xyratex 4900FC).   We
> have about 500 clients and support both tcp and o2ib NIDS.   We run
> Lustre 1.6.4.2 on a patched 2.6.18-8.1.14 CentOS/RH kernel.   It has
> worked *very* well for us for over a year now - very few problems with
> very good performance under very heavy loads.
>
> We've tried setting our timeout to lower values but settled on the
> 1000 value (despite the long recovery periods) because if we don't,
> our lustre connectivity starts to breakdown and our mounts come and go
> with errors like "transport endpoint failure" or "transport endpoint
> not connected" or some such (its been a while now).    File system
> access comes and goes randomly on nodes.    We tried many tunings and
> looked for other sources of  problems (underlying network issues).
> Ultimately, the only thing we found that fixed this was to extend the
> timeout value.
>
> I know you will be tempted to tell us that our network must be flakey
> but it simply is not.   We'd love to understand why we need such a
> large timeout value and why, if we don't use a large value, we see
> these transport end-point failures.    However, after spending several
> days trying to understand and resolve the issue, we finally just
> accepted the long timeout as a suitable workaround.
>
> I wonder if there are others who have silently done the same.   We'll
> be upgrading to 1.6.6 or 1.6.7 in the not-too-distant future.    Maybe
> then we'll be able to do away with the long timeout value but until
> then, we need it.  :(
>
> Just my two cents,
>
> Charlie Taylor
> UF HPC Center
>
> On Feb 25, 2009, at 11:03 AM, Brian J. Murrell wrote:
>
>> On Wed, 2009-02-25 at 16:09 +0100, Thomas Roth wrote:
>>>
>>> Our /proc/sys/lustre/timeout is 1000
>>
>> That's way to high.  Long recoveries are exactly the reason you don't
>> want this number to be huge.
>>
>>> - there has been some debate on
>>> this large value here, but most other installation will not run in a
>>> network environment with a setup as crazy as ours.
>>
>> What's so crazy about your set up?  Unless your network is very flaky
>> and/or you have not tuned your OSSes properly, there should be no  
>> need
>> for such a high timeout and if there is you need to address the
>> problems
>> requiring it.
>>
>>> Putting the timeout
>>> to 100 immediately results in "Transport endpoint" errors,
>>> impossible to
>>> run Lustre like this.
>>
>> 300 is the max that we recommend and we have very large production
>> clusters that use such values successfully.
>>
>>> Since this is a 1.6.5.1 system, I activated the adaptive timeouts
>>> - and
>>> put them to equally large values,
>>> /sys/module/ptlrpc/parameters/at_max = 6000
>>> /sys/module/ptlrpc/parameters/at_history = 6000
>>> /sys/module/ptlrpc/parameters/at_early_margin = 50
>>> /sys/module/ptlrpc/parameters/at_extra = 30
>>
>> This is likely not good as well.  I will let somebody more
>> knowledgeable
>> about AT comment in detail though.  It's a new feature and not  
>> getting
>> wide use at all yet, so the real-world experience is still low.
>>
>> b.
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>