[Lustre-discuss] networking problem with kernel-lustre-smp-2.6.9-55.0.9.EL_lustre.1.6.3(1.6.4)smp

Anatoly Oreshkin Anatoly.Oreshkin at pnpi.spb.ru
Fri Dec 14 07:28:29 PST 2007


Hello,

We have Scientific Linux SL release 4.4 (aka RHEL 4.4) with 
kernel 2.6.9-42.0.3.ELsmp installed on our cluster.

I've got from clusterfs site
http://www.clusterfs.com/downloads/public/Lustre/v1.6/Production/1.6.3/rhel-2.6-i686/

binary rpms for RHEL-2.6-i686:

kernel-lustre-smp-2.6.9-55.0.9.EL_lustre.1.6.3.i686.rpm
kernel-lustre-source-2.6.9-55.0.9.EL_lustre.1.6.3.i686.rpm
lustre-ldiskfs-3.0.2-2.6.9_55.0.9.EL_lustre.1.6.3smp.i686.rpm
lustre-modules-1.6.3-2.6.9_55.0.9.EL_lustre.1.6.3smp.i686.rpm
lustre-1.6.3-2.6.9_55.0.9.EL_lustre.1.6.3smp.i686.rpm

and installed them on head node and all client nodes.

First  I've tried to test networking with this kernel on NFS file system
without lustre file system.
NFS server is started on head node and exports non-lustre file system.
I've started reading on client nodes NFS file system and encountered
networking problem.

On one client with ethernet card Marvell 88E8050 Gigabit (driver sky2)
kernel has given "hw tcp v4 csum failed" error messages and
reading has hung.

On other client node with ethernet card Intel 82566DC Gigabit (driver 
e1000) command dmesg has showed

nfs_statfs: statfs error = 512
nfs_statfs: statfs error = 512
nfs_statfs: statfs error = 512
....

and reading also has hung.

With my old kernel from SL 4.4 there were no such problems.

Then I've installed binary rpms for Lustre 1.6.4
from 
http://www.clusterfs.com/downloads/public/Lustre/v1.6/Production/1.6.4

and tried again the same test reading but result was the same.

What might be wrong ?

Thank you.






More information about the lustre-discuss mailing list