[Lustre-devel] lustre client goes wacky?
Ron
ron.rex at gmail.com
Tue Feb 12 16:50:43 PST 2008
Hi,
I don't know if this is a bug or it's it's a misconfig or something
else.
What I have is:
server = 1.6.4.1+vanilla 2.6.18.8 (mgs+2*ost+mdt all on a single
server)
clients = cvs.20080116+2.6.23.12
I mounted the server from several clients and several hours later
noticed the top display below. dmesg show some lustre errors (also
below).Can someone comment on what could be going on?
Thanks,
Ron
top - 18:28:09 up 5 days, 3:36, 1 user, load average: 12.00, 12.00,
11.94
Tasks: 168 total, 13 running, 136 sleeping, 0 stopped, 19 zombie
Cpu(s): 0.0% us, 37.5% sy, 0.0% ni, 62.5% id, 0.0% wa, 0.0% hi,
0.0% si
Mem: 16468196k total, 526828k used, 15941368k free, 42996k
buffers
Swap: 4192924k total, 0k used, 4192924k free, 294916k
cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
1533 root 20 0 0 0 0 R 100 0.0 308:54.05
ll_cfg_requeue
32071 root 20 0 0 0 0 R 100 0.0 308:15.95
socknal_reaper
32073 root 20 0 0 0 0 R 100 0.0 308:48.90
ptlrpcd
1 root 20 0 4832 588 492 R 0 0.0 0:02.48
init
2 root 15 -5 0 0 0 S 0 0.0 0:00.00
kthreadd
Lustre: OBD class driver, info at clusterfs.com
Lustre Version: 1.6.4.50
Build Version: b1_6-20080210103536-
CHANGED-.usr.src.linux-2.6.23.12-2.6.23.12
Lustre: Added LNI 192.168.241.42 at tcp [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; info at clusterfs.com
Lustre: Binding irq 17 to CPU 0 with cmd: echo 1 > /proc/irq/17/
smp_affinity
Lustre: MGC192.168.241.247 at tcp: Reactivating import
Lustre: setting import datafs-OST0002_UUID INACTIVE by administrator
request
Lustre: datafs-OST0002-osc-ffff810241ad7800.osc: set parameter
active=0
LustreError: 32181:0:(lov_obd.c:230:lov_connect_obd()) not connecting
OSC datafs-OST0002_UUID; administratively disabled
Lustre: Client datafs-client has started
Lustre: Request x7684 sent from MGC192.168.241.247 at tcp to NID
192.168.241.247 at tcp 15s ago has timed out (limit 15s).
LustreError: 166-1: MGC192.168.241.247 at tcp: Connection to service MGS
via nid 192.168.241.247 at tcp was lost; in progress operations using
this service will fail.
LustreError: 32073:0:(import.c:212:ptlrpc_invalidate_import()) MGS: rc
= -110 waiting for callback (1 != 0)
LustreError: 32073:0:(import.c:216:ptlrpc_invalidate_import()) @@@
still on sending list req at ffff81040fa14600 x7684/t0 o400-
>MGS at 192.168.241.247@tcp:26/25 lens 128/256 e 0 to 11 dl 1202843837
ref 1 fl Rpc:EXN/0/0 rc -4/0
Lustre: Request x7685 sent from datafs-MDT0000-mdc-ffff810241ad7800 to
NID 192.168.241.247 at tcp 115s ago has timed out (limit 15s).
Lustre: datafs-MDT0000-mdc-ffff810241ad7800: Connection to service
datafs-MDT0000 via nid 192.168.241.247 at tcp was lost; in progress
operations using this service will wait for recovery to complete.
Lustre: MGC192.168.241.247 at tcp: Reactivating import
Lustre: MGC192.168.241.247 at tcp: Connection restored to service MGS
using nid 192.168.241.247 at tcp.
LustreError: 32059:0:(events.c:116:reply_in_callback()) ASSERTION(ev-
>mlength == lustre_msg_early_size()) failed
LustreError: 32059:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG
Call Trace:
[<ffffffff88000b53>] :libcfs:lbug_with_loc+0x73/0xc0
[<ffffffff88007bd4>] :libcfs:libcfs_assertion_failed+0x54/0x60
[<ffffffff8815c746>] :ptlrpc:reply_in_callback+0x426/0x430
[<ffffffff88027f35>] :lnet:lnet_enq_event_locked+0xc5/0xf0
[<ffffffff88028475>] :lnet:lnet_finalize+0x1e5/0x270
[<ffffffff880625d9>] :ksocklnd:ksocknal_process_receive+0x469/0xab0
[<ffffffff88060350>] :ksocklnd:ksocknal_tx_done+0x80/0x1e0
[<ffffffff8806301c>] :ksocklnd:ksocknal_scheduler+0x12c/0x7e0
[<ffffffff8024e850>] autoremove_wake_function+0x0/0x30
[<ffffffff8024e850>] autoremove_wake_function+0x0/0x30
[<ffffffff8020c918>] child_rip+0xa/0x12
[<ffffffff88062ef0>] :ksocklnd:ksocknal_scheduler+0x0/0x7e0
[<ffffffff8020c90e>] child_rip+0x0/0x12
LustreError: dumping log to /tmp/lustre-log.1202843942.32059
Lustre: Request x7707 sent from MGC192.168.241.247 at tcp to NID
192.168.241.247 at tcp 15s ago has timed out (limit 15s).
Lustre: Skipped 2 previous similar messages
More information about the lustre-devel
mailing list