[lustre-discuss] Error destroying object

Sidiney Crescencio sidiney.crescencio at clustervision.com
Thu May 3 03:34:17 PDT 2018


Hello Andreas,

Thanks for you answer.

[root at storage06 ~]# debugfs -c -R "stat O/0/d$((0x1bfc24c %
32))/$((0x1bfc24c))" /dev/mapper/ost001c | grep -i fid
debugfs 1.42.13.wc6 (05-Feb-2017)
/dev/mapper/ost001c: catastrophic mode - not reading inode or group bitmaps
  lma: fid=[0x100000000:0x1bfc2c7:0x0] compat=8 incompat=0
  fid = "18 93 02 00 0b 00 00 00 3c c2 01 00 00 00 00 00 " (16)
  fid: parent=[0xb00029318:0x1c23c:0x0] stripe=0


[root at node024 ~]# lfs fid2path /lustre/ 0x100000000:0x1bfc2c7:0x0
ioctl err -22: Invalid argument (22)
fid2path: error on FID 0x100000000:0x1bfc2c7:0x0: Invalid argument

[root at node024 ~]# lfs fid2path /lustre/ 0xb00029318:0x1c23c:0x0
fid2path: error on FID 0xb00029318:0x1c23c:0x0: No such file or directory

Am I doing right? I think so, actually looks like the file is already gone
as I tought in the first moment..

About the hang thread , I've filtered like this and couldn't find nothing
that might indicate the issue, what else we can check for solve this error?



[root at storage06 ~]# cat /var/log/messages* | grep -i OST001c | grep -v
destroying | grep -v scrub
Apr 30 11:01:13 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to e9153718-f82d-d90b-268a-e8c9a5e3af1c (at 192.168.2.19 at o2ib)
May  2 15:54:17 storage06 kernel: Lustre: wurfs-OST001c: haven't heard from
client 9c4b82f6-a2a7-3488-c2b3-cabb9cf333e5 (at 192.168.2.25 at o2ib) in 1352
seconds. I think it's dead, and I am evicting it. exp ffff8804c451e000, cur
1525269257 expire 1525268357 last 1525267905
Apr  5 10:11:43 storage06 kernel: Lustre: wurfs-OST001c: haven't heard from
client c1966b99-1299-9da0-3280-bd6ad84f8f27 (at 192.168.2.51 at o2ib) in 1352
seconds. I think it's dead, and I am evicting it. exp ffff8804c4519800, cur
1522915903 expire 1522915003 last 1522914551
Apr  5 10:44:20 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to 7fbdaa81-10cb-2464-f981-883bee1f6fdf (at 192.168.2.21 at o2ib)
Apr  5 10:59:52 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to aef29b00-0042-9f5e-da17-3bd3b655e13d (at 192.168.2.2 at o2ib)
Apr  5 11:09:59 storage06 kernel: Lustre: wurfs-OST001c: haven't heard from
client c4cec4f1-b994-2ad2-be36-196b9f5c1b76 (at 192.168.2.161 at o2ib) in 1352
seconds. I think it's dead, and I am evicting it. exp ffff88059a0a2400, cur
1522919399 expire 1522918499 last 1522918047
Apr 14 14:58:02 storage06 kernel: LustreError:
0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer
expired after 377s: evicting client at 192.168.2.33 at o2ib  ns:
filter-wurfs-OST001c_UUID lock: ffff880bbf72dc00/0xb64a498f40bc086 lrc:
4/0,0 mode: PR/PR res: [0x38ee37e:0x0:0x0].0x0 rrc: 2 type: EXT
[0->18446744073709551615] (req 0->18446744073709551615) flags:
0x60000400010020 nid: 192.168.2.33 at o2ib remote: 0x73aa9e5b8c684dc5 expref:
328 pid: 39172 timeout: 16574376013 lvb_type: 1
Apr 14 15:05:56 storage06 kernel: Lustre: wurfs-OST001c: Client
wurfs-MDT0000-mdtlov_UUID (at 192.168.2.182 at o2ib) reconnecting
Apr 14 15:05:56 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to 192.168.2.182 at o2ib (at 192.168.2.182 at o2ib)
Apr 14 15:05:56 storage06 kernel: Lustre: wurfs-OST001c: deleting orphan
objects from 0x0:59696086 to 0x0:59705564
Apr 15 15:38:28 storage06 kernel: Lustre: wurfs-OST001c: haven't heard from
client a21e3dcc-af43-1dc2-b552-ca341a6b5e77 (at 192.168.2.5 at o2ib) in 1352
seconds. I think it's dead, and I am evicting it. exp ffff880629717000, cur
1523799508 expire 1523798608 last 1523798156
Apr 15 16:01:07 storage06 kernel: Lustre: wurfs-OST001c: haven't heard from
client c931b18c-e0cf-4a0c-d95f-9a8cf60f3b3f (at 192.168.2.36 at o2ib) in 1352
seconds. I think it's dead, and I am evicting it. exp ffff880d3fcfdc00, cur
1523800867 expire 1523799967 last 1523799515
Apr 15 18:45:35 storage06 kernel: Lustre: wurfs-OST001c: haven't heard from
client af5f8ac5-fb5d-cd1c-cf97-b755700778bc (at 192.168.2.9 at o2ib) in 1352
seconds. I think it's dead, and I am evicting it. exp ffff8807fed8d000, cur
1523810735 expire 1523809835 last 1523809383
Apr 16 09:04:27 storage06 kernel: Lustre:
39169:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has
failed due to network error: [sent 1523862169/real 1523862267]
req at ffff8809e5746300 x1584854319120496/t0(0)
o104->wurfs-OST001c at 192.168.2.38@o2ib:15/16 lens 296/224 e 0 to 1 dl
1523862736 ref 1 fl Rpc:X/2/ffffffff rc 0/-1
Apr 16 09:44:18 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to 2e95bceb-837d-5518-9198-48dd0b2b9a83 (at 192.168.2.40 at o2ib)
Apr 16 09:53:26 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to 07eac249-8012-fe49-1037-3920d06e1403 (at 192.168.2.38 at o2ib)
Apr 16 09:55:48 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to 3df9306f-8024-c85f-8d42-3ad863a3f4c0 (at 192.168.2.171 at o2ib)
Apr 16 10:11:25 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to d9a56a18-c51e-2b0c-561d-3b0fa31ca8f7 (at 192.168.2.12 at o2ib)
Apr 16 10:12:06 storage06 kernel: Lustre: wurfs-OST001c: Connection
restored to c00bd597-31b4-ded9-fd06-d02500010dad (at 192.168.2.172 at o2ib)
Apr 16 13:50:44 storage06 kernel: Lustre: wurfs-OST001c: haven't heard from
client 4d69154c-ca88-ce45-23f7-ff76f1a6423f (at 192.168.2.14 at o2ib) in 1352
seconds. I think it's dead, and I am evicting it. exp ffff8804c4678800, cur
1523879444 expire 1523878544 last 1523878092

Many thanks.


On 2 May 2018 at 20:16, Dilger, Andreas <andreas.dilger at intel.com> wrote:

> This is an OST FID, so you would need to get the parent MDT FID to be able
> to resolve the pathname.
>
> Assuming an ldiskfs OST you can use:
>
>     'debugfs -c -R "stat O/0/d$((0x1bfc24c % 32))/$((0x1bfc24c))"
> LABEL=wurfs-OST001c'
>
> To get the parent FID, then "lfs fid2path /mnt/wurfs <FID>" on a client to
> find the path.
>
> That said, the -115 error is "-EINPROGRESS", which means the OST thinks it
> is already trying to do this. Maybe a hung OST thread?
>
> Cheers, Andreas
>
> On May 2, 2018, at 06:53, Sidiney Crescencio <sidiney.crescencio@
> clustervision.com> wrote:
>
> Hi All,
>
> I need help to discover what file is about this error or how to solve it.
>
> Apr 30 13:48:02 storage06 kernel: LustreError: 44779:0:(ofd_dev.c:1884:ofd_destroy_hdl())
> wurfs-OST001c: error destroying object [0x1001c0000:0x1bfc24c:0x0]: -115
>
> I've been trying to map this to a file but I can't since I don't have the
> FID
>
> Anyone knows how to sort it out?
>
> Thanks in advance
>
> --
> Best Regards,
>
>
>
> Sidiney
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>


-- 
Best Regards,

[image: clustervision_logo.png]
Sidiney Crescencio
Technical Support Engineer


Direct: +31 20 407 7550
Skype: sidiney.crescencio_1
sidiney.crescencio at clustervision.com

ClusterVision BV
Gyroscoopweg 56
1042 AC Amsterdam
The Netherlands
Tel: +31 20 407 7550
Fax: +31 84 759 8389
www.clustervision.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20180503/a0c12603/attachment.html>


More information about the lustre-discuss mailing list