[Lustre-discuss] Clients frozen during pressure test

Lu Wang wanglu at ihep.ac.cn
Mon Aug 3 07:10:52 PDT 2009


Dear list , 
		I am doing pressure test for a new 10-OSS Lustre file system using 70 client node. (each server has 10Gb Ethernet connection, each client has 1Gb Ethernet connection, there are 3 OST on 3 RAID6 volulme for one OSS)
		Each time, after about 4 hours, clients began to be frozen one after another. command "lfs check osts" shows that the frozen clients cannot access some OSTs. 
		error: check 'testfs-OST0007-osc-c9b82800': Resource temporarily unavailable (11)
		error: check 'testfs-OST0008-osc-c9b82800': Resource temporarily unavailable (11)
		error: check 'testfs-OST0009-osc-c9b82800': Resource temporarily unavailable (11)

and  command "lctl ping server" , shows "Input/Out put error"
				
  	 	   However, the servers are not so busy( util% <10)  when clients are frozen. My question is:
			1.Why  clients cannot reconnect when servers are not so busy? 
			2. I am setting timeout=1000, do I need add timeout to a number larger?
			3.Is there any other  variable needed to be tuned under heavy pressure? 
each server has 10Gb Ethernet connection, each client has 1Gb Ethernet connection. 
            




Best Regards
Lu Wang
--------------------------------------------------------------	  
Computing Center
IHEP						
Beijing 100049,China		Email: Lu.Wang at ihep.ac.cn							
--------------------------------------------------------------   				
                          






More information about the lustre-discuss mailing list