[Lustre-discuss] lustre + samba question

Sun Sep 8 23:17:05 PDT 2013

Hello, I`m using :

Version 3.6.9-151.el6.

I`ve added

        locking = No
        posix locking = No

To smb.conf and we`ll see if there is some improvement. Any suggestion on
samba version, or if you say that CentOS package is not working correctly
should I build samba from source ?

On Fri, Sep 6, 2013 at 11:07 AM, 한종우 <jw.han at apexcns.com> wrote:

> What samba distribution did you use?
>
> Redhat or CentOS samba package is not working correctly.
>
>
> 2013/9/4 Nikolay Kvetsinski <nkvecinski at gmail.com>
>
>> I don't to start a new thread. But something fishy is going on. The
>> client I write you about exports Lustre via samba for some windows users.
>> The client has a 10Gigabit Ethernet NIC. Usually it works fine, but
>> sometimes this happens :
>>
>> Sep  4 16:07:22 cache kernel: LustreError: Skipped 1 previous similar
>> message
>> Sep  4 16:07:22 cache kernel: LustreError: 167-0:
>> lustre0-MDT0000-mdc-ffff88062edb0c00: This client was evicted by
>> lustre0-MDT0000; in progress operations using this service will fail.
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 7411:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -5
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115380745406215
>> mdc close failed: rc = -108
>> Sep  4 16:07:22 cache smbd[7580]: [2013/09/04 16:07:22.451276,  0]
>> smbd/process.c:2440(keepalive_fn)
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115307009623849
>> mdc close failed: rc = -108
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 7411:0:(file.c:159:ll_close_inode_openhandle()) Skipped 1 previous similar
>> message
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6553:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -108
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6553:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 21 previous similar messages
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout
>> [0x200002b8a:0x119dd:0x0] error -108.
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout
>> [0x200002b8a:0x119dd:0x0] error -108.
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6519:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at
>> 0: rc -108
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6519:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at
>> 0: rc -108
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6553:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at
>> 0: rc -108
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6553:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at
>> 0: rc -108
>> Sep  4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.469729,  0]
>> smbd/dfree.c:137(sys_disk_free)
>> Sep  4 16:07:22 cache smbd[6517]:   disk_free: sys_fsusage() failed.
>> Error was : Cannot send after transport endpoint shutdown
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6517:0:(lmv_obd.c:1289:lmv_statfs()) can't stat MDS #0
>> (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6517:0:(llite_lib.c:1610:ll_statfs_internal()) md_statfs fails: rc = -108
>> Sep  4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470531,  0]
>> smbd/dfree.c:137(sys_disk_free)
>> Sep  4 16:07:22 cache smbd[6517]:   disk_free: sys_fsusage() failed.
>> Error was : Cannot send after transport endpoint shutdown
>> Sep  4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470922,  0]
>> smbd/dfree.c:137(sys_disk_free)
>> Sep  4 16:07:22 cache smbd[6517]:   disk_free: sys_fsusage() failed.
>> Error was : Cannot send after transport endpoint shutdown
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6517:0:(lmv_obd.c:1289:lmv_statfs()) can't stat MDS #0
>> (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir
>> [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553]
>> Sep  4 16:07:22 cache kernel: LustreError:
>> 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir
>> [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553]
>> Sep  4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517112,  0]
>> smbd/dfree.c:137(sys_disk_free)
>> Sep  4 16:07:22 cache smbd[8958]:   disk_free: sys_fsusage() failed.
>> Error was : Cannot send after transport endpoint shutdown
>> Sep  4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517496,  0]
>> smbd/dfree.c:137(sys_disk_free)
>>
>> Eventually it connects again to the MDS:
>>
>> Sep  4 16:07:31 cache kernel: LustreError:
>> 6582:0:(ldlm_resource.c:811:ldlm_resource_complain()) Resource:
>> ffff880a125f7e40 (8589945618/117308/0/0) (rc: 1)
>> Sep  4 16:07:31 cache kernel: LustreError:
>> 6582:0:(ldlm_resource.c:1423:ldlm_resource_dump()) --- Resource:
>> ffff880a125f7e40 (8589945618/117308/0/0) (rc: 2)
>> Sep  4 16:07:31 cache kernel: Lustre:
>> lustre0-MDT0000-mdc-ffff88062edb0c00: Connection restored to
>> lustre0-MDT0000 (at 192.168.11.23 at tcp)
>>
>> But now the load average hits to high heaven ... 25 or even 50 for a 16
>> CPU machine. And shortly after that :
>>
>> Sep  4 16:08:14 cache kernel: INFO: task ptlrpcd_6:2425 blocked for more
>> than 120 seconds.
>> Sep  4 16:08:14 cache kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Sep  4 16:08:14 cache kernel: ptlrpcd_6     D 000000000000000c     0
>>  2425      2 0x00000080
>> Sep  4 16:08:14 cache kernel: ffff880c34ab7a10 0000000000000046
>> 0000000000000000 ffffffffa0a01736
>> Sep  4 16:08:14 cache kernel: ffff880c34ab79d0 ffffffffa09fc199
>> ffff88062e274000 ffff880c34cf8000
>> Sep  4 16:08:14 cache kernel: ffff880c34aae638 ffff880c34ab7fd8
>> 000000000000fb88 ffff880c34aae638
>> Sep  4 16:08:14 cache kernel: Call Trace:
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0a01736>] ?
>> ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa09fc199>] ?
>> ksocknal_find_conn_locked+0x159/0x290 [ksocklnd]
>> Sep  4 16:08:14 cache kernel: [<ffffffff8150f1ee>]
>> __mutex_lock_slowpath+0x13e/0x180
>> Sep  4 16:08:14 cache kernel: [<ffffffff8150f08b>] mutex_lock+0x2b/0x50
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0751ecf>]
>> cl_lock_mutex_get+0x6f/0xd0 [obdclass]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0b5b469>]
>> lovsub_parent_lock+0x49/0x120 [lov]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0b5c60f>]
>> lovsub_lock_modify+0x7f/0x1e0 [lov]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa07514d8>]
>> cl_lock_modify+0x98/0x310 [obdclass]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0748dae>] ?
>> cl_object_attr_unlock+0xe/0x20 [obdclass]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0ac1e52>] ?
>> osc_lock_lvb_update+0x1a2/0x470 [osc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0ac2302>]
>> osc_lock_granted+0x1e2/0x2b0 [osc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0ac30b0>]
>> osc_lock_upcall+0x3f0/0x5e0 [osc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0ac2cc0>] ?
>> osc_lock_upcall+0x0/0x5e0 [osc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0aa3876>]
>> osc_enqueue_fini+0x106/0x240 [osc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0aa82c2>]
>> osc_enqueue_interpret+0xe2/0x1e0 [osc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa0884d2c>]
>> ptlrpc_check_set+0x2ac/0x1b20 [ptlrpc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa08b1c7b>]
>> ptlrpcd_check+0x53b/0x560 [ptlrpc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa08b21a3>] ptlrpcd+0x233/0x390
>> [ptlrpc]
>> Sep  4 16:08:14 cache kernel: [<ffffffff81063310>] ?
>> default_wake_function+0x0/0x20
>> Sep  4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390
>> [ptlrpc]
>> Sep  4 16:08:14 cache kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20
>> Sep  4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390
>> [ptlrpc]
>> Sep  4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390
>> [ptlrpc]
>> Sep  4 16:08:14 cache kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
>>
>> Sometimes its the smbd daemon that spits similar call trace. I`m using
>> lustre 2.4.0-2.6.32_358.6.2.el6.x86_64_gd3f91c4.x86_64. This is the only
>> client I use for exporting lustre via samba. No other client is having
>> errors or issues like that. Sometimes I can see that this particular client
>> disconnects from some of the OSTs too. I`ll test the NIC to see if there
>> are any hardware problems. but besides that, does any one have any clues or
>> hints that want to share with me ?
>>
>>
>> Cheers,
>>
>>
>>
>> On Tue, Jun 25, 2013 at 8:55 AM, Nikolay Kvetsinski <nkvecinski at gmail.com
>> > wrote:
>>
>>> Thank you all for your quick responses. Unfortunately my own stupidity
>>> played me this time ... the MDS server was not a domain member. After
>>> joining it it works OK.
>>>
>>> Cheers,
>>> :(
>>>
>>>
>>> On Mon, Jun 24, 2013 at 8:48 PM, Michael Watters <wattersmt at gmail.com>wrote:
>>>
>>>> Does your uid in Windows match the uid on the samba server?  Does the
>>>> samba account exist?  I ran into similar issues with NFS.
>>>>
>>>>
>>>> On Mon, Jun 24, 2013 at 5:59 AM, Nikolay Kvetsinski <
>>>> nkvecinski at gmail.com> wrote:
>>>>
>>>>> Hello guys,
>>>>>
>>>>> I`m using the latest feature release
>>>>> (lustre-2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64.rpm)
>>>>> + centos 6.4. Lustre itself is working fine, but when I export it with
>>>>> samba and try to connect with a windows 7 client I get :
>>>>>
>>>>> Jun 24 12:53:14 R-82L kernel: LustreError:
>>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13
>>>>> Jun 24 12:53:16 R-82L kernel: LustreError:
>>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13
>>>>> Jun 24 12:53:16 R-82L kernel: LustreError:
>>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages
>>>>> Jun 24 12:53:16 R-82L kernel: LustreError:
>>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13
>>>>> Jun 24 12:53:16 R-82L kernel: LustreError:
>>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages
>>>>>
>>>>> And on the windows client I get "You don't have permissions to access
>>>>> ....." error. Permissions are 777. I created a local share just to test the
>>>>> samba server and its working. The error pops up only when trying to access
>>>>> samba share with lustre backend storage.
>>>>>
>>>>> Any help will be greatly appreciated.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list
>>>>> Lustre-discuss at lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
>
> --
> Jongwoo Han
> Principal consultant
> jw.han at apexcns.com
> Tel:   +82- 2-3413-1704
> Mobile:+82-505-227-6108
> Fax: : +82-  2-544-7962
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20130909/27fdc1c4/attachment.htm>