[Lustre-discuss] Desperate problems with Lustre 1.6.5.1

Jeremy Mann jeremy at biochem.uthscsa.edu
Fri Aug 1 10:13:25 PDT 2008


I got around the lustre-modules problem by removing the RPM and
reinstalling it. That worked, but now, I'm at a loss what is going on
here. So far I have 1 dedicated mgs/mds node, 1 ost and 1 client.

Making the mgs/mds node went fine, same with the ost. The problem is with
the client and I can't figure out why its doing this.

On the client, df hangs and logs show me:

Lustre: 4569:0:(import.c:395:import_select_connection())
bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
latency to 20s
Lustre: Request x22 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
192.168.1.254 at tcp 5s ago has timed out (limit 5s).
Lustre: 4569:0:(import.c:395:import_select_connection())
bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
latency to 25s
Lustre: Request x25 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
192.168.1.254 at tcp 5s ago has timed out (limit 5s).
Lustre: 4569:0:(import.c:395:import_select_connection())
bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
latency to 30s
LustreError: 4568:0:(events.c:55:request_out_callback()) @@@ type 4,
status -5  req at ffff81004eaefa00 x28/t0
o8->bcffs-OST0000_UUID at 192.168.1.254@tcp:6/4 lens 240/400 e 0 to 5 dl
1217610639 ref 2 fl Rpc:/0/0 rc 0/0
Lustre: Request x28 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
192.168.1.254 at tcp 0s ago has timed out (limit 5s).

Each device, mgs/mdt, ost and client have gigE. The client is the front
end that also serves NFS and Grid service, which they work fine.

What is the latency issue with Lustre 1.6.5.1?


-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672




More information about the lustre-discuss mailing list