[Lustre-discuss] networking problem with kernel-lustre-smp-2.6.9-55.0.9.EL_lustre.1.6.3(1.6.4)smp
Anatoly Oreshkin
Anatoly.Oreshkin at pnpi.spb.ru
Fri Dec 14 07:28:29 PST 2007
Hello,
We have Scientific Linux SL release 4.4 (aka RHEL 4.4) with
kernel 2.6.9-42.0.3.ELsmp installed on our cluster.
I've got from clusterfs site
http://www.clusterfs.com/downloads/public/Lustre/v1.6/Production/1.6.3/rhel-2.6-i686/
binary rpms for RHEL-2.6-i686:
kernel-lustre-smp-2.6.9-55.0.9.EL_lustre.1.6.3.i686.rpm
kernel-lustre-source-2.6.9-55.0.9.EL_lustre.1.6.3.i686.rpm
lustre-ldiskfs-3.0.2-2.6.9_55.0.9.EL_lustre.1.6.3smp.i686.rpm
lustre-modules-1.6.3-2.6.9_55.0.9.EL_lustre.1.6.3smp.i686.rpm
lustre-1.6.3-2.6.9_55.0.9.EL_lustre.1.6.3smp.i686.rpm
and installed them on head node and all client nodes.
First I've tried to test networking with this kernel on NFS file system
without lustre file system.
NFS server is started on head node and exports non-lustre file system.
I've started reading on client nodes NFS file system and encountered
networking problem.
On one client with ethernet card Marvell 88E8050 Gigabit (driver sky2)
kernel has given "hw tcp v4 csum failed" error messages and
reading has hung.
On other client node with ethernet card Intel 82566DC Gigabit (driver
e1000) command dmesg has showed
nfs_statfs: statfs error = 512
nfs_statfs: statfs error = 512
nfs_statfs: statfs error = 512
....
and reading also has hung.
With my old kernel from SL 4.4 there were no such problems.
Then I've installed binary rpms for Lustre 1.6.4
from
http://www.clusterfs.com/downloads/public/Lustre/v1.6/Production/1.6.4
and tried again the same test reading but result was the same.
What might be wrong ?
Thank you.
More information about the lustre-discuss
mailing list