[Lustre-discuss] o2ib module prevents shutdown

Wojciech Turek wjt27 at cam.ac.uk
Tue Apr 15 10:33:07 PDT 2008


Hi,

This usually happens when you try to remove IB card drivers before  
stopping lustre network. What I do is after clean umount I run  
lustre_rmmod script which removes all lustre modules and stops lustre  
network. Then you can safety  remove IB card driver and nothing should  
get stuck.

Cheers,

Wojciech

On 15 Apr 2008, at 18:22, Michael Sternberg wrote:

>
> On Apr 15, 2008, at 12:15, Brian J. Murrell wrote:
>> On Tue, 2008-04-15 at 12:07 -0500, Michael Sternberg wrote:
>>> Hello,
>>>
>>> Not sure if this is the right forum:  I'm encountering difficulties
>>> with o2ib which prevents an LNET shutdown from proceeding:
>>>
>>> 	Unloading OpenIB kernel modules:NET: Unregistered protocal family  
>>> 27
>>> 	Failed to unload rdma_cm
>>> 	Failed to unload rdma_cm
>>> 	Failed to unload ib_cm
>>> 	Failed to unload ib_sa
>>> 	LustreError: 131-3: Received notification of device removal
>>> 	Please shutdown LNET to allow this to proceed
>>>
>>> This happens on server and client nodes alike.  We run RHEL5.1 and
>>> OFED 1.2, kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp from CFS/Sun.
>>>
>>> I narrowed it down to module ko2iblnd, which I attempt to remove
>>> first (added to PRE_UNLOAD_MODULES in /etc/init.d/openibd), but it
>>> doesn't work.  Strangely, in "lsmod" the use count of the module is
>>> one, but I don't see where it's used.
>>
>> To ask what might sound like a stupid question, but you do have all  
>> of
>> your lustre filesystems unmounted before you try to unload ko2iblnd,
>> yes?  Can you show us what's in /proc/mounts when you try to unload
>> ko2iblnd but it shows a refcount > 0?
>
> No problem with the question - anything that helps:
>
> # cat /proc/mounts
> rootfs / rootfs rw 0 0
> /dev/root / ext3 rw,data=ordered 0 0
> /dev /dev tmpfs rw 0 0
> /proc /proc proc rw 0 0
> /sys /sys sysfs rw 0 0
> /proc/bus/usb /proc/bus/usb usbfs rw 0 0
> devpts /dev/pts devpts rw 0 0
> /dev/sda1 /boot ext3 rw,data=ordered 0 0
> tmpfs /dev/shm tmpfs rw 0 0
> none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
> sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
> 172.16.100.3:/drbd/exports/opt /opt nfs
> rw 
> ,vers=3,rsize=8192,wsize=8192,hard,intr,proto=tcp,timeo=600,retrans=2,
> sec=sys,addr=172.16.100.3 0 0
> /etc/auto.misc /misc autofs
> rw,fd=6,pgrp=3689,timeout=300,minproto=5,maxproto=5,indirect 0 0
> -hosts /net autofs
> rw,fd=11,pgrp=3689,timeout=300,minproto=5,maxproto=5,indirect 0 0
>
>
> This was even after:
>
> 	# ifconfig ib0 down
>
> I also have:
>
> 	# grep lnet /etc/modprobe.conf
> 	options lnet networks="o2ib0,tcp0(eth0)" accept_port=6988
>
> (the accept_port spec doesn't work either on a tcp-only node, but
> that's a separate issue, or so I believe.)
>
>
> Regards, Michael
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list