[Lustre-discuss] Luster clients getting evicted

Mon Feb 4 11:17:34 PST 2008

On Feb 04, 2008  13:17 -0500, Brock Palen wrote:
>> On Monday 04 February 2008 07:11:11 am Brock Palen wrote:
>>> on our cluster that has been running lustre for about 1 month. I have
>>> 1 MDT/MGS and 1 OSS with 2 OST's.
>>>
>>> Our cluster uses all Gige and has about 608 nodes 1854 cores.
>>
>> This seems to be a lot of clients for only one OSS (and thus for only
>> one GigE link to the OSS).
>
> Its more for evaluation, the 'real' file system is a NFS file system 
> provided by a OnStor bobcat.  So anything is a improvement.  The cluster IS 
> to big, but there isn't a person at the university who is willing to pay 
> for anything other than more cluster nodes.  Enough with politics.

I'd suggest increasing the lustre timeout, to avoid eviction if the system
is overloaded:

Temporarily (on the MDS, OSS, and all client nodes):
	[root at mds]# sysctl -w lustre.timeout=300

If this helps you can set it permanently on the MGS (MDS) node:
	mgs> lctl conf_param testfs-MDT0000.sys.timeout=300

replacing "testfs" with the actual name of your filesystem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.