[Lustre-discuss] o2ib module prevents shutdown

Michael Sternberg sternberg at anl.gov
Tue Apr 15 10:22:17 PDT 2008


On Apr 15, 2008, at 12:15, Brian J. Murrell wrote:
> On Tue, 2008-04-15 at 12:07 -0500, Michael Sternberg wrote:
>> Hello,
>>
>> Not sure if this is the right forum:  I'm encountering difficulties
>> with o2ib which prevents an LNET shutdown from proceeding:
>>
>> 	Unloading OpenIB kernel modules:NET: Unregistered protocal family 27
>> 	Failed to unload rdma_cm
>> 	Failed to unload rdma_cm
>> 	Failed to unload ib_cm
>> 	Failed to unload ib_sa
>> 	LustreError: 131-3: Received notification of device removal
>> 	Please shutdown LNET to allow this to proceed
>>
>> This happens on server and client nodes alike.  We run RHEL5.1 and
>> OFED 1.2, kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp from CFS/Sun.
>>
>> I narrowed it down to module ko2iblnd, which I attempt to remove
>> first (added to PRE_UNLOAD_MODULES in /etc/init.d/openibd), but it
>> doesn't work.  Strangely, in "lsmod" the use count of the module is
>> one, but I don't see where it's used.
>
> To ask what might sound like a stupid question, but you do have all of
> your lustre filesystems unmounted before you try to unload ko2iblnd,
> yes?  Can you show us what's in /proc/mounts when you try to unload
> ko2iblnd but it shows a refcount > 0?

No problem with the question - anything that helps:

# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,data=ordered 0 0
/dev /dev tmpfs rw 0 0
/proc /proc proc rw 0 0
/sys /sys sysfs rw 0 0
/proc/bus/usb /proc/bus/usb usbfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/sda1 /boot ext3 rw,data=ordered 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
172.16.100.3:/drbd/exports/opt /opt nfs  
rw,vers=3,rsize=8192,wsize=8192,hard,intr,proto=tcp,timeo=600,retrans=2, 
sec=sys,addr=172.16.100.3 0 0
/etc/auto.misc /misc autofs  
rw,fd=6,pgrp=3689,timeout=300,minproto=5,maxproto=5,indirect 0 0
-hosts /net autofs  
rw,fd=11,pgrp=3689,timeout=300,minproto=5,maxproto=5,indirect 0 0


This was even after:

	# ifconfig ib0 down

I also have:

	# grep lnet /etc/modprobe.conf
	options lnet networks="o2ib0,tcp0(eth0)" accept_port=6988

(the accept_port spec doesn't work either on a tcp-only node, but  
that's a separate issue, or so I believe.)


Regards, Michael




More information about the lustre-discuss mailing list