[Lustre-discuss] locking/perm. denied problem with samba export after update to lustre 2.0

robert spam.robert at risefx.com
Wed Nov 3 13:59:27 PDT 2010


hi all,

we recently updated our servers (from centos 5.2, lustre 1.8.1.1) and
clients (centos 5.4, from lustre 1.8.3) to centos 5.4 and lustre
2.0.0.1. since then we are experiencing massive problems with users
accessing the lustre filesystem through a samba share. we did not have
these problems before and as we kept the exporting client os and samba
installation and just updated the lustre client on that machine, my
first guess is that it might be the new lustre version that does not
work together with samba too well.

a few soft symptoms:
- users browsing directories have to wait very long to see their files
- random files are not accessible, under heavy lod it is about every
10th file
- the problem increases with heavy load and is almost gone under minimal
load
- the problem is worse right after a reboot of the osts like on nov.
1st, pls see below
- accessig files takes a lot of time (sometimes seconds), even on the
linux clinet level
- there seems to be a bandwith limit per smbd process, many processes
still bring the system to the known transfer rate limits from before the
update

some log snippets...

from the client that exports the lustre fs to samba:
---------------------------------------------------
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked()) lock at ffff8100243fded0[2 5 -4 1
0 00000006] R(1):[0, 18446744073709551615]@[0x190a601:0xa3d6525c:0x0] {
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked())     vvp at ffff8101a18e2330:
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked())     lov at ffff8101727857b0: 1
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked())     0 0:
lock at ffff8101c5048228[0 5 0 0 0 00000000] R(1):[0,
18446744073709551615]@[0x100050000:0x659329:0x0] {
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked())     lovsub at ffff810057ddf7a0:
[0 ffff8101727857b0 R(1):[0,
18446744073709551615]@[0x190a601:0xa3d6525c:0x0]] [0 ffff81019dc8a610
R(1):[0, 18446744073709551615]@[0x190a601:0xa3d6525c:0x0]]
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked())     osc at ffff81010f5e9ce8:
ffff81019a661000 00101001 0xc3ee5f2d34f879ed 4 0000000000000000 size:
52428800 mtime: 1287786414 atime: 1288789757 ctime: 1287786414 blocks:
102400
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked()) } lock at ffff8101c5048228
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked())
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked()) } lock at ffff8100243fded0
Nov  3 14:14:39 lh1 kernel: LustreError:
2982:0:(cl_lock.c:1371:cl_unuse_locked()) unuse return -4
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked()) lock at ffff8101c5048ed0[2 5 -4
1 0 00000006] R(1):[0, 18446744073709551615]@[0x1909f52:0xa3d61076:0x0] {
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked())     vvp at ffff81014921b218:
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked())     lov at ffff81004a5a6c90: 1
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked())     0 0:
lock at ffff8101c50483d8[0 5 0 0 0 00000000] R(1):[0,
18446744073709551615]@[0x100080000:0x657d44:0x0] {
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked())     lovsub at ffff81013ad466a0:
[0 ffff81004a5a6c90 R(1):[0,
18446744073709551615]@[0x1909f52:0xa3d61076:0x0]] [0 ffff8100441754d8
R(1):[0, 18446744073709551615]@[0x1909f52:0xa3d61076:0x0]]
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked())     osc at ffff81013f822b88:
ffff810138e33240 00101001 0xc3ee5f2d34f87d35 4 0000000000000000 size:
17419974 mtime: 1287761855 atime: 1288789756 ctime: 1287761855 blocks: 34024
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked()) } lock at ffff8101c50483d8
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked())
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked()) } lock at ffff8101c5048ed0
Nov  3 14:14:58 lh1 kernel: LustreError:
31267:0:(cl_lock.c:1371:cl_unuse_locked()) unuse return -4
Nov  3 14:16:05 lh1 smbd[24032]: [2010/11/03 14:16:05.305407,  0]
smbd/open.c:579(open_file)
Nov  3 14:16:05 lh1 smbd[24032]:   Error doing fstat on open file
projects/laconia/scenes/072-040/setup-maya/data/072-040_ocean-fire_v014_simohl/fluidShape1Frame1104.mc
(Interrupted system call)
--------------------------------------------

there are some errors on all three oss, but not from a corresponding
time. however, the locking problem started before nov 1st:
--------------------------------------------
Nov  1 13:41:30 oss3 kernel: Lustre: lustre-OST0003: received MDS
connection from 192.168.3.205 at o2ib
Nov  1 13:41:30 oss3 kernel: Lustre:
3641:0:(lustre_log.h:472:llog_group_set_export()) lustre-OST0003: export
for group 0 is changed: 0xffff81006d499800 -> 0xffff81004c5b2800
Nov  1 13:41:30 oss3 kernel: Lustre:
3641:0:(llog_net.c:168:llog_receptor_accept()) changing the import
ffff8100550c0000 - ffff810076130000
Nov  1 13:41:30 oss3 kernel: Lustre:
3652:0:(filter.c:2508:filter_llog_connect()) lustre-OST0003: Recovery
from log 0x207c003/0x1:1fa2b553
Nov  1 16:34:11 oss3 kernel: LustreError:
3706:0:(filter_io_26.c:662:filter_commitrw_write()) lustre-OST0006: slow
i_mutex 54s
Nov  1 16:34:11 oss3 kernel: Lustre:
3689:0:(lustre_fsfilt.h:226:fsfilt_start_log()) lustre-OST0006: slow
journal start 40s
Nov  1 16:34:11 oss3 kernel: LustreError:
3677:0:(filter_io_26.c:725:filter_commitrw_write()) lustre-OST0006: slow
direct_io 50s
Nov  1 16:34:11 oss3 kernel: LustreError:
3707:0:(filter_io_26.c:662:filter_commitrw_write()) lustre-OST0006: slow
i_mutex 52s
Nov  1 16:34:11 oss3 kernel: LustreError:
3707:0:(filter_io_26.c:662:filter_commitrw_write()) Skipped 1 previous
similar message
Nov  1 16:34:12 oss3 kernel: LustreError:
3716:0:(filter_io.c:826:filter_preprw_write()) lustre-OST0006: slow
start_page_write 51s
Nov  1 17:05:44 oss3 kernel: Lustre:
3698:0:(lustre_fsfilt.h:226:fsfilt_start_log()) lustre-OST0009: slow
journal start 31s
Nov  1 17:05:44 oss3 kernel: Lustre:
3698:0:(lustre_fsfilt.h:226:fsfilt_start_log()) Skipped 1 previous
similar message
Nov  1 17:05:44 oss3 kernel: Lustre:
3721:0:(lustre_fsfilt.h:258:fsfilt_brw_start_log()) lustre-OST0009: slow
journal start 34s
Nov  1 17:05:44 oss3 kernel: Lustre:
3721:0:(filter_io_26.c:675:filter_commitrw_write()) lustre-OST0009: slow
brw_start 34s
Nov  1 17:05:44 oss3 kernel: Lustre:
3684:0:(filter_io_26.c:725:filter_commitrw_write()) lustre-OST0009: slow
direct_io 41s
Nov  1 17:15:06 oss3 kernel: LustreError:
3675:0:(lustre_fsfilt.h:258:fsfilt_brw_start_log()) lustre-OST0006: slow
journal start 81s
Nov  1 17:15:06 oss3 kernel: LustreError:
3675:0:(filter_io_26.c:675:filter_commitrw_write()) lustre-OST0006: slow
brw_start 81s
Nov  1 17:15:06 oss3 kernel: LustreError:
3705:0:(lustre_fsfilt.h:226:fsfilt_start_log()) lustre-OST0006: slow
journal start 79s
Nov  1 17:15:06 oss3 kernel: Lustre:
3846:0:(lustre_fsfilt.h:226:fsfilt_start_log()) lustre-OST0006: slow
journal start 47s
Nov  1 17:15:06 oss3 kernel: Lustre:
3846:0:(lustre_fsfilt.h:226:fsfilt_start_log()) Skipped 1 previous
similar message
Nov  1 17:15:06 oss3 kernel: Lustre:
3617:0:(filter.c:1434:filter_parent_lock()) lustre-OST0006: slow parent
lock 36s
Nov  1 17:15:06 oss3 kernel: LustreError:
3670:0:(filter_io_26.c:725:filter_commitrw_write()) lustre-OST0006: slow
direct_io 106s
Nov  1 17:15:06 oss3 kernel: Lustre:
3620:0:(filter.c:1434:filter_parent_lock()) lustre-OST0006: slow parent
lock 32s
Nov  1 17:53:53 oss3 kernel: Lustre:
3719:0:(filter_io_26.c:662:filter_commitrw_write()) lustre-OST0003: slow
i_mutex 38s
Nov  1 17:53:53 oss3 kernel: Lustre:
3696:0:(filter_io_26.c:725:filter_commitrw_write()) lustre-OST0003: slow
direct_io 38s
Nov  1 17:54:02 oss3 kernel: Lustre:
3685:0:(filter_io_26.c:662:filter_commitrw_write()) lustre-OST0003: slow
i_mutex 46s
Nov  1 17:54:02 oss3 kernel: Lustre:
3685:0:(filter_io_26.c:662:filter_commitrw_write()) Skipped 1 previous
similar message
--------------------------------------

any help is very much appreciated!

thanks!

robert



More information about the lustre-discuss mailing list