[Lustre-discuss] in the future

Papp Tamás tompos at martos.bme.hu
Thu Apr 17 08:41:47 PDT 2008


Dear All,

Apr 17 16:59:10 node1 kernel: Lustre: 
7833:0:(router.c:167:lnet_notify()) Ignoring prediction from 
10.1.1.11 at tcp of 192.168.0.71 at tcp down 542730734650 seconds in the future

What could cause this error message?

I don't find anything really useful searching the web.

My main problem not exactly this, I'm investigating about a strange 
behaviour.



We have a small cluster with 8 nodes, and a samba gw for windows 
clients. Linux clients can use the cluster without any problems, the 
samba machine can see the mount without any problem.

But the samba share freeze up after some hours, but it could take max. 
2-3 days. As I see, with CentOS 5.1 it was only 4-5-6 hours, but last 
time with Debian 4.0 (with 2.6.18 stock kernel) it was 2-3 days.

As I see, after a while there appears some smbd process with switching 
between state 'D' and 'S'. When the share get to be unreachable the smbd 
processes cannot be killed and the lustre mount cannot be umounted, but 
it still total usable without any problem, but of course only in linux.

There is nothing special in samba logs, and nothing special in kernel 
logs related to lustre or anything else, except the above message some 
of the nodes.

Lustre: 1.6.4.3
Samba 3.0.28a-1 right now, but it was CentOS 4.4 I guess with 3.0.10 and 
the same problem.

Any idea?

Thank you.

tamas





More information about the lustre-discuss mailing list