[lustre-discuss] Filesystem hanging....
Phill Harvey-Smith
p.harvey-smith at warwick.ac.uk
Fri Aug 12 05:37:56 PDT 2016
On 11/08/2016 16:10, Colin Faber wrote:
>> First glance indicates you're having network connectivity problems,
>> (possibly driver issue with your NIC?)
I don't seem to have had any problems with any other services running on
the cluster, and there are no messages in the journal or the /var/log
files relating to network errors.
Oddly though when the /home filesystem hangs the /storage and /scratch
filesystems also served by the same luster servers continue to respond
without problems.
What does semm top have some bearing on it is that the first few writes
seem to succeed and then it will hang, though it was first noticed
through samba, it also appears to also happen logged in to the console
directly.
>> (Check MTU settings, etc?)
Pasting as quotation as it stops thunderbird from wrapping the text.....
> root at test-r710:~# ifconfig
> eno1 Link encap:Ethernet HWaddr 00:26:b9:84:c7:8d
> inet addr:192.168.1.80 Bcast:192.168.1.255 Mask:255.255.255.0
> inet6 addr: fe80::226:b9ff:fe84:c78d/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:8516 errors:0 dropped:0 overruns:0 frame:0
> TX packets:23199 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:5297958 (5.2 MB) TX bytes:3222616 (3.2 MB)
>
> eno2 Link encap:Ethernet HWaddr 00:26:b9:84:c7:8f
> inet addr:192.168.0.80 Bcast:192.168.0.255 Mask:255.255.255.0
> inet6 addr: fe80::226:b9ff:fe84:c78f/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:1374513 errors:0 dropped:0 overruns:0 frame:0
> TX packets:168485 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:2026863011 (2.0 GB) TX bytes:21861558 (21.8 MB)
>
> eno4 Link encap:Ethernet HWaddr 00:26:b9:84:c7:93
> inet addr:137.205.232.159 Bcast:137.205.232.255 Mask:255.255.255.128
> inet6 addr: fe80::226:b9ff:fe84:c793/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:11483 errors:0 dropped:0 overruns:0 frame:0
> TX packets:10560 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:3504764 (3.5 MB) TX bytes:5731764 (5.7 MB)
> root at test-r710:~# route -n
> Kernel IP routing table
> Destination Gateway Genmask Flags Metric Ref Use Iface
> 0.0.0.0 137.205.232.254 0.0.0.0 UG 0 0 0 eno4
> 137.205.232.128 0.0.0.0 255.255.255.128 U 0 0 0 eno4
> 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eno2
> 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eno1
Lustre mounts in fstab :> # Lustre mounted
> 192.168.0.4 at tcp0:/storage /storage lustre defaults,_netdev,flock 0 0
> 192.168.0.4 at tcp0:/home /home lustre defaults,_netdev,flock 0 0
> 192.168.0.4 at tcp0:/scratch /scratch lustre defaults,_netdev,flock 0 0
I've also tried compiling the latest source and installing those modules
: Lustre: Build Version: 2.8.56_26_g6fad3ab this does seem not to have
the problem with matlab (mentioned about a month or so ago), but still
has the hanging problem.
The lustre startup logs in the joural are here :
> Aug 12 12:57:10 test-r710 kernel: Lustre: Lustre: Build Version: 2.8.56_26_g6fad3ab
> Aug 12 12:57:10 test-r710 kernel: Lustre: Server MGS version (2.1.0.0) is much older than client. Consider upgrading server (2.8.56_26_g6fad3ab)
> Aug 12 12:57:10 test-r710 kernel: Lustre: Trying to mount a client with IR setting not compatible with current mgc. Force to use current mgc setting that is IR disabled.
> Aug 12 12:57:10 test-r710 kernel: Lustre: Mounted home-client
Cheers.
Phill.
Cheers.
Phill.
More information about the lustre-discuss
mailing list