[Lustre-discuss] o2ib module prevents shutdown

Michael Sternberg sternberg at anl.gov
Fri Apr 18 21:42:17 PDT 2008


Hello Wojciech,

Sorry for the delayed response;  lustre_rmmod worked in a manual test  
to remove modules after ib0 was down;  I have yet to try this as part  
of the init.d shutdown scripts; an alternate solution with a script  
didn't quite work.

Thanks for the hint!


Regards, Michael

On Apr 15, 2008, at 12:33 , Wojciech Turek wrote:
> Hi,
>
> This usually happens when you try to remove IB card drivers before  
> stopping lustre network. What I do is after clean umount I run  
> lustre_rmmod script which removes all lustre modules and stops  
> lustre network. Then you can safety  remove IB card driver and  
> nothing should get stuck.
>
> Cheers,
>
> Wojciech
>
> On 15 Apr 2008, at 18:22, Michael Sternberg wrote:
>
>>
>> On Apr 15, 2008, at 12:15, Brian J. Murrell wrote:
>>> On Tue, 2008-04-15 at 12:07 -0500, Michael Sternberg wrote:
>>>> Hello,
>>>>
>>>> Not sure if this is the right forum:  I'm encountering difficulties
>>>> with o2ib which prevents an LNET shutdown from proceeding:
>>>>
>>>> 	Unloading OpenIB kernel modules:NET: Unregistered protocal  
>>>> family 27
>>>> 	Failed to unload rdma_cm
>>>> 	Failed to unload rdma_cm
>>>> 	Failed to unload ib_cm
>>>> 	Failed to unload ib_sa
>>>> 	LustreError: 131-3: Received notification of device removal
>>>> 	Please shutdown LNET to allow this to proceed
>>>>
>>>> This happens on server and client nodes alike.  We run RHEL5.1 and
>>>> OFED 1.2, kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp from CFS/Sun.
>>>>
>>>> I narrowed it down to module ko2iblnd, which I attempt to remove
>>>> first (added to PRE_UNLOAD_MODULES in /etc/init.d/openibd), but it
>>>> doesn't work.  Strangely, in "lsmod" the use count of the module is
>>>> one, but I don't see where it's used.
>>>
>>> To ask what might sound like a stupid question, but you do have  
>>> all of
>>> your lustre filesystems unmounted before you try to unload ko2iblnd,
>>> yes?  Can you show us what's in /proc/mounts when you try to unload
>>> ko2iblnd but it shows a refcount > 0?
>>
>> No problem with the question - anything that helps:
>>
>> # cat /proc/mounts
>> rootfs / rootfs rw 0 0
>> /dev/root / ext3 rw,data=ordered 0 0
>> /dev /dev tmpfs rw 0 0
>> /proc /proc proc rw 0 0
>> /sys /sys sysfs rw 0 0
>> /proc/bus/usb /proc/bus/usb usbfs rw 0 0
>> devpts /dev/pts devpts rw 0 0
>> /dev/sda1 /boot ext3 rw,data=ordered 0 0
>> tmpfs /dev/shm tmpfs rw 0 0
>> none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
>> sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
>> 172.16.100.3:/drbd/exports/opt /opt nfs
>> rw 
>> ,vers 
>> =3,rsize=8192,wsize=8192,hard,intr,proto=tcp,timeo=600,retrans=2,
>> sec=sys,addr=172.16.100.3 0 0
>> /etc/auto.misc /misc autofs
>> rw,fd=6,pgrp=3689,timeout=300,minproto=5,maxproto=5,indirect 0 0
>> -hosts /net autofs
>> rw,fd=11,pgrp=3689,timeout=300,minproto=5,maxproto=5,indirect 0 0
>>
>>
>> This was even after:
>>
>> 	# ifconfig ib0 down
>>
>> I also have:
>>
>> 	# grep lnet /etc/modprobe.conf
>> 	options lnet networks="o2ib0,tcp0(eth0)" accept_port=6988
>>
>> (the accept_port spec doesn't work either on a tcp-only node, but
>> that's a separate issue, or so I believe.)
>>
>>
>> Regards, Michael
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>




More information about the lustre-discuss mailing list