[Lustre-discuss] Clients frozen during pressure test
Lu Wang
wanglu at ihep.ac.cn
Mon Aug 3 07:10:52 PDT 2009
Dear list ,
I am doing pressure test for a new 10-OSS Lustre file system using 70 client node. (each server has 10Gb Ethernet connection, each client has 1Gb Ethernet connection, there are 3 OST on 3 RAID6 volulme for one OSS)
Each time, after about 4 hours, clients began to be frozen one after another. command "lfs check osts" shows that the frozen clients cannot access some OSTs.
error: check 'testfs-OST0007-osc-c9b82800': Resource temporarily unavailable (11)
error: check 'testfs-OST0008-osc-c9b82800': Resource temporarily unavailable (11)
error: check 'testfs-OST0009-osc-c9b82800': Resource temporarily unavailable (11)
and command "lctl ping server" , shows "Input/Out put error"
However, the servers are not so busy( util% <10) when clients are frozen. My question is:
1.Why clients cannot reconnect when servers are not so busy?
2. I am setting timeout=1000, do I need add timeout to a number larger?
3.Is there any other variable needed to be tuned under heavy pressure?
each server has 10Gb Ethernet connection, each client has 1Gb Ethernet connection.
Best Regards
Lu Wang
--------------------------------------------------------------
Computing Center
IHEP
Beijing 100049,China Email: Lu.Wang at ihep.ac.cn
--------------------------------------------------------------
More information about the lustre-discuss
mailing list