[Lustre-discuss] Desperate problems with Lustre 1.6.5.1

Jeremy Mann jeremy at biochem.uthscsa.edu
Fri Aug 1 11:16:03 PDT 2008


Any ideas to the latency issue? I've tried everything I can and it only
happens with my frontend node. If I try other client nodes, it works.

Ideas?

Lustre: Client bcffs-client has started
Lustre: Request x42 sent from bcffs-OST0000-osc-ffff8101ff6b4000 to NID
192.168.1.254 at tcp 5s ago has timed out (limit 5s).
Lustre: 5097:0:(import.c:395:import_select_connection())
bcffs-OST0000-osc-ffff8101ff6b4000: tried all connections, increasing
latency to 5s
LustreError: 5096:0:(events.c:55:request_out_callback()) @@@ type 4,
status -5  req at ffff81004e550200 x47/t0
o8->bcffs-OST0000_UUID at 192.168.1.254@tcp:6/4 lens 240/400 e 0 to 5 dl
1217614425 ref 2 fl Rpc:/0/0 rc 0/0




Jeremy Mann wrote:
> I got around the lustre-modules problem by removing the RPM and
> reinstalling it. That worked, but now, I'm at a loss what is going on
here. So far I have 1 dedicated mgs/mds node, 1 ost and 1 client.
>
> Making the mgs/mds node went fine, same with the ost. The problem is
with the client and I can't figure out why its doing this.
>
> On the client, df hangs and logs show me:
>
> Lustre: 4569:0:(import.c:395:import_select_connection())
> bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
latency to 20s
> Lustre: Request x22 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
192.168.1.254 at tcp 5s ago has timed out (limit 5s).
> Lustre: 4569:0:(import.c:395:import_select_connection())
> bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
latency to 25s
> Lustre: Request x25 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
192.168.1.254 at tcp 5s ago has timed out (limit 5s).
> Lustre: 4569:0:(import.c:395:import_select_connection())
> bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
latency to 30s
> LustreError: 4568:0:(events.c:55:request_out_callback()) @@@ type 4,
status -5  req at ffff81004eaefa00 x28/t0
> o8->bcffs-OST0000_UUID at 192.168.1.254@tcp:6/4 lens 240/400 e 0 to 5 dl
1217610639 ref 2 fl Rpc:/0/0 rc 0/0
> Lustre: Request x28 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
192.168.1.254 at tcp 0s ago has timed out (limit 5s).
>
> Each device, mgs/mdt, ost and client have gigE. The client is the front
end that also serves NFS and Grid service, which they work fine.
>
> What is the latency issue with Lustre 1.6.5.1?
>
>
> --
> Jeremy Mann
> jeremy at biochem.uthscsa.edu
>
> University of Texas Health Science Center
> Bioinformatics Core Facility
> http://www.bioinformatics.uthscsa.edu
> Phone: (210) 567-2672
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>


-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672




-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672




More information about the lustre-discuss mailing list