[Lustre-discuss] lustre + samba question

한종우 jw.han at apexcns.com
Fri Sep 6 01:07:41 PDT 2013


What samba distribution did you use?

Redhat or CentOS samba package is not working correctly.


2013/9/4 Nikolay Kvetsinski <nkvecinski at gmail.com>

> I don't to start a new thread. But something fishy is going on. The client
> I write you about exports Lustre via samba for some windows users. The
> client has a 10Gigabit Ethernet NIC. Usually it works fine, but sometimes
> this happens :
>
> Sep  4 16:07:22 cache kernel: LustreError: Skipped 1 previous similar
> message
> Sep  4 16:07:22 cache kernel: LustreError: 167-0:
> lustre0-MDT0000-mdc-ffff88062edb0c00: This client was evicted by
> lustre0-MDT0000; in progress operations using this service will fail.
> Sep  4 16:07:22 cache kernel: LustreError:
> 7411:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -5
> Sep  4 16:07:22 cache kernel: LustreError:
> 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115380745406215
> mdc close failed: rc = -108
> Sep  4 16:07:22 cache smbd[7580]: [2013/09/04 16:07:22.451276,  0]
> smbd/process.c:2440(keepalive_fn)
> Sep  4 16:07:22 cache kernel: LustreError:
> 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115307009623849
> mdc close failed: rc = -108
> Sep  4 16:07:22 cache kernel: LustreError:
> 7411:0:(file.c:159:ll_close_inode_openhandle()) Skipped 1 previous similar
> message
> Sep  4 16:07:22 cache kernel: LustreError:
> 6553:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -108
> Sep  4 16:07:22 cache kernel: LustreError:
> 6553:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 21 previous similar messages
> Sep  4 16:07:22 cache kernel: LustreError:
> 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout
> [0x200002b8a:0x119dd:0x0] error -108.
> Sep  4 16:07:22 cache kernel: LustreError:
> 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout
> [0x200002b8a:0x119dd:0x0] error -108.
> Sep  4 16:07:22 cache kernel: LustreError:
> 6519:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at
> 0: rc -108
> Sep  4 16:07:22 cache kernel: LustreError:
> 6519:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at
> 0: rc -108
> Sep  4 16:07:22 cache kernel: LustreError:
> 6553:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at
> 0: rc -108
> Sep  4 16:07:22 cache kernel: LustreError:
> 6553:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at
> 0: rc -108
> Sep  4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.469729,  0]
> smbd/dfree.c:137(sys_disk_free)
> Sep  4 16:07:22 cache smbd[6517]:   disk_free: sys_fsusage() failed. Error
> was : Cannot send after transport endpoint shutdown
> Sep  4 16:07:22 cache kernel: LustreError:
> 6517:0:(lmv_obd.c:1289:lmv_statfs()) can't stat MDS #0
> (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108
> Sep  4 16:07:22 cache kernel: LustreError:
> 6517:0:(llite_lib.c:1610:ll_statfs_internal()) md_statfs fails: rc = -108
> Sep  4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470531,  0]
> smbd/dfree.c:137(sys_disk_free)
> Sep  4 16:07:22 cache smbd[6517]:   disk_free: sys_fsusage() failed. Error
> was : Cannot send after transport endpoint shutdown
> Sep  4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470922,  0]
> smbd/dfree.c:137(sys_disk_free)
> Sep  4 16:07:22 cache smbd[6517]:   disk_free: sys_fsusage() failed. Error
> was : Cannot send after transport endpoint shutdown
> Sep  4 16:07:22 cache kernel: LustreError:
> 6517:0:(lmv_obd.c:1289:lmv_statfs()) can't stat MDS #0
> (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108
> Sep  4 16:07:22 cache kernel: LustreError:
> 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir
> [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553]
> Sep  4 16:07:22 cache kernel: LustreError:
> 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir
> [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553]
> Sep  4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517112,  0]
> smbd/dfree.c:137(sys_disk_free)
> Sep  4 16:07:22 cache smbd[8958]:   disk_free: sys_fsusage() failed. Error
> was : Cannot send after transport endpoint shutdown
> Sep  4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517496,  0]
> smbd/dfree.c:137(sys_disk_free)
>
> Eventually it connects again to the MDS:
>
> Sep  4 16:07:31 cache kernel: LustreError:
> 6582:0:(ldlm_resource.c:811:ldlm_resource_complain()) Resource:
> ffff880a125f7e40 (8589945618/117308/0/0) (rc: 1)
> Sep  4 16:07:31 cache kernel: LustreError:
> 6582:0:(ldlm_resource.c:1423:ldlm_resource_dump()) --- Resource:
> ffff880a125f7e40 (8589945618/117308/0/0) (rc: 2)
> Sep  4 16:07:31 cache kernel: Lustre:
> lustre0-MDT0000-mdc-ffff88062edb0c00: Connection restored to
> lustre0-MDT0000 (at 192.168.11.23 at tcp)
>
> But now the load average hits to high heaven ... 25 or even 50 for a 16
> CPU machine. And shortly after that :
>
> Sep  4 16:08:14 cache kernel: INFO: task ptlrpcd_6:2425 blocked for more
> than 120 seconds.
> Sep  4 16:08:14 cache kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep  4 16:08:14 cache kernel: ptlrpcd_6     D 000000000000000c     0  2425
>      2 0x00000080
> Sep  4 16:08:14 cache kernel: ffff880c34ab7a10 0000000000000046
> 0000000000000000 ffffffffa0a01736
> Sep  4 16:08:14 cache kernel: ffff880c34ab79d0 ffffffffa09fc199
> ffff88062e274000 ffff880c34cf8000
> Sep  4 16:08:14 cache kernel: ffff880c34aae638 ffff880c34ab7fd8
> 000000000000fb88 ffff880c34aae638
> Sep  4 16:08:14 cache kernel: Call Trace:
> Sep  4 16:08:14 cache kernel: [<ffffffffa0a01736>] ?
> ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd]
> Sep  4 16:08:14 cache kernel: [<ffffffffa09fc199>] ?
> ksocknal_find_conn_locked+0x159/0x290 [ksocklnd]
> Sep  4 16:08:14 cache kernel: [<ffffffff8150f1ee>]
> __mutex_lock_slowpath+0x13e/0x180
> Sep  4 16:08:14 cache kernel: [<ffffffff8150f08b>] mutex_lock+0x2b/0x50
> Sep  4 16:08:14 cache kernel: [<ffffffffa0751ecf>]
> cl_lock_mutex_get+0x6f/0xd0 [obdclass]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0b5b469>]
> lovsub_parent_lock+0x49/0x120 [lov]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0b5c60f>]
> lovsub_lock_modify+0x7f/0x1e0 [lov]
> Sep  4 16:08:14 cache kernel: [<ffffffffa07514d8>]
> cl_lock_modify+0x98/0x310 [obdclass]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0748dae>] ?
> cl_object_attr_unlock+0xe/0x20 [obdclass]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0ac1e52>] ?
> osc_lock_lvb_update+0x1a2/0x470 [osc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0ac2302>]
> osc_lock_granted+0x1e2/0x2b0 [osc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0ac30b0>]
> osc_lock_upcall+0x3f0/0x5e0 [osc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0ac2cc0>] ?
> osc_lock_upcall+0x0/0x5e0 [osc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0aa3876>]
> osc_enqueue_fini+0x106/0x240 [osc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0aa82c2>]
> osc_enqueue_interpret+0xe2/0x1e0 [osc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa0884d2c>]
> ptlrpc_check_set+0x2ac/0x1b20 [ptlrpc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa08b1c7b>]
> ptlrpcd_check+0x53b/0x560 [ptlrpc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa08b21a3>] ptlrpcd+0x233/0x390
> [ptlrpc]
> Sep  4 16:08:14 cache kernel: [<ffffffff81063310>] ?
> default_wake_function+0x0/0x20
> Sep  4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390
> [ptlrpc]
> Sep  4 16:08:14 cache kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
> Sep  4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390
> [ptlrpc]
> Sep  4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390
> [ptlrpc]
> Sep  4 16:08:14 cache kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
>
> Sometimes its the smbd daemon that spits similar call trace. I`m using
> lustre 2.4.0-2.6.32_358.6.2.el6.x86_64_gd3f91c4.x86_64. This is the only
> client I use for exporting lustre via samba. No other client is having
> errors or issues like that. Sometimes I can see that this particular client
> disconnects from some of the OSTs too. I`ll test the NIC to see if there
> are any hardware problems. but besides that, does any one have any clues or
> hints that want to share with me ?
>
>
> Cheers,
>
>
>
> On Tue, Jun 25, 2013 at 8:55 AM, Nikolay Kvetsinski <nkvecinski at gmail.com>wrote:
>
>> Thank you all for your quick responses. Unfortunately my own stupidity
>> played me this time ... the MDS server was not a domain member. After
>> joining it it works OK.
>>
>> Cheers,
>> :(
>>
>>
>> On Mon, Jun 24, 2013 at 8:48 PM, Michael Watters <wattersmt at gmail.com>wrote:
>>
>>> Does your uid in Windows match the uid on the samba server?  Does the
>>> samba account exist?  I ran into similar issues with NFS.
>>>
>>>
>>> On Mon, Jun 24, 2013 at 5:59 AM, Nikolay Kvetsinski <
>>> nkvecinski at gmail.com> wrote:
>>>
>>>> Hello guys,
>>>>
>>>> I`m using the latest feature release
>>>> (lustre-2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64.rpm)
>>>> + centos 6.4. Lustre itself is working fine, but when I export it with
>>>> samba and try to connect with a windows 7 client I get :
>>>>
>>>> Jun 24 12:53:14 R-82L kernel: LustreError:
>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13
>>>> Jun 24 12:53:16 R-82L kernel: LustreError:
>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13
>>>> Jun 24 12:53:16 R-82L kernel: LustreError:
>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages
>>>> Jun 24 12:53:16 R-82L kernel: LustreError:
>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13
>>>> Jun 24 12:53:16 R-82L kernel: LustreError:
>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages
>>>>
>>>> And on the windows client I get "You don't have permissions to access
>>>> ....." error. Permissions are 777. I created a local share just to test the
>>>> samba server and its working. The error pops up only when trying to access
>>>> samba share with lustre backend storage.
>>>>
>>>> Any help will be greatly appreciated.
>>>>
>>>> Cheers,
>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>


-- 
Jongwoo Han
Principal consultant
jw.han at apexcns.com
Tel:   +82- 2-3413-1704
Mobile:+82-505-227-6108
Fax: : +82-  2-544-7962
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130906/7003ef1e/attachment.htm>


More information about the lustre-discuss mailing list