[Lustre-discuss] Desperate problems with Lustre 1.6.5.1

Andreas Dilger adilger at sun.com
Fri Aug 1 13:28:04 PDT 2008


On Aug 01, 2008  13:16 -0500, Jeremy Mann wrote:
> Any ideas to the latency issue? I've tried everything I can and it only
> happens with my frontend node. If I try other client nodes, it works.

Do you have xinetd or selinux or other port filtering enabled?

Try running tcpdump on the server to see if you get any traffic at all.

You can also try "telnet {server} 988" to see if you get any connection
at all.  It won't print anything to the screen, but after typing something
and pressing enter it will disconnect.

> Lustre: Client bcffs-client has started
> Lustre: Request x42 sent from bcffs-OST0000-osc-ffff8101ff6b4000 to NID
> 192.168.1.254 at tcp 5s ago has timed out (limit 5s).
> Lustre: 5097:0:(import.c:395:import_select_connection())
> bcffs-OST0000-osc-ffff8101ff6b4000: tried all connections, increasing
> latency to 5s
> LustreError: 5096:0:(events.c:55:request_out_callback()) @@@ type 4,
> status -5  req at ffff81004e550200 x47/t0
> o8->bcffs-OST0000_UUID at 192.168.1.254@tcp:6/4 lens 240/400 e 0 to 5 dl
> 1217614425 ref 2 fl Rpc:/0/0 rc 0/0
> 
> 
> 
> 
> Jeremy Mann wrote:
> > I got around the lustre-modules problem by removing the RPM and
> > reinstalling it. That worked, but now, I'm at a loss what is going on
> here. So far I have 1 dedicated mgs/mds node, 1 ost and 1 client.
> >
> > Making the mgs/mds node went fine, same with the ost. The problem is
> with the client and I can't figure out why its doing this.
> >
> > On the client, df hangs and logs show me:
> >
> > Lustre: 4569:0:(import.c:395:import_select_connection())
> > bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
> latency to 20s
> > Lustre: Request x22 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
> 192.168.1.254 at tcp 5s ago has timed out (limit 5s).
> > Lustre: 4569:0:(import.c:395:import_select_connection())
> > bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
> latency to 25s
> > Lustre: Request x25 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
> 192.168.1.254 at tcp 5s ago has timed out (limit 5s).
> > Lustre: 4569:0:(import.c:395:import_select_connection())
> > bcffs-OST0000-osc-ffff81007f203400: tried all connections, increasing
> latency to 30s
> > LustreError: 4568:0:(events.c:55:request_out_callback()) @@@ type 4,
> status -5  req at ffff81004eaefa00 x28/t0
> > o8->bcffs-OST0000_UUID at 192.168.1.254@tcp:6/4 lens 240/400 e 0 to 5 dl
> 1217610639 ref 2 fl Rpc:/0/0 rc 0/0
> > Lustre: Request x28 sent from bcffs-OST0000-osc-ffff81007f203400 to NID
> 192.168.1.254 at tcp 0s ago has timed out (limit 5s).
> >
> > Each device, mgs/mdt, ost and client have gigE. The client is the front
> end that also serves NFS and Grid service, which they work fine.
> >
> > What is the latency issue with Lustre 1.6.5.1?
> >
> >
> > --
> > Jeremy Mann
> > jeremy at biochem.uthscsa.edu
> >
> > University of Texas Health Science Center
> > Bioinformatics Core Facility
> > http://www.bioinformatics.uthscsa.edu
> > Phone: (210) 567-2672
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> 
> 
> -- 
> Jeremy Mann
> jeremy at biochem.uthscsa.edu
> 
> University of Texas Health Science Center
> Bioinformatics Core Facility
> http://www.bioinformatics.uthscsa.edu
> Phone: (210) 567-2672
> 
> 
> 
> 
> -- 
> Jeremy Mann
> jeremy at biochem.uthscsa.edu
> 
> University of Texas Health Science Center
> Bioinformatics Core Facility
> http://www.bioinformatics.uthscsa.edu
> Phone: (210) 567-2672
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list