[lustre-discuss] Lustre Precreation error.

Andrea del Monaco andrea.delmonaco at clustervision.com
Thu Mar 16 08:59:04 PDT 2017


An updater regarding this:
UPDATE:
I have noticed that on this OST (wurfs-OST001b) The IO Scrub gets launched
every ~7 seconds:
[root at storage06 wurfs-OST001b]# cat oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 8 seconds
time_since_latest_start: 8 seconds
time_since_last_checkpoint: 8 seconds
latest_start_position: 12
last_checkpoint_position: 30515713
first_failure_position: N/A
checked: 3417
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 1
success_count: 2526979
run_time: 0 seconds
average_speed: 3417 objects/sec
real-time_speed: N/A
current_position: N/A
lf_scanned: 0
lf_repaired: 0
lf_failed: 0
[root at storage06 wurfs-OST001b]# cat oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 2 seconds
time_since_latest_start: 2 seconds
time_since_last_checkpoint: 2 seconds
latest_start_position: 12
last_checkpoint_position: 30515713
first_failure_position: N/A
checked: 3417
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 1
success_count: 2526980
run_time: 0 seconds
average_speed: 3417 objects/sec
real-time_speed: N/A
current_position: N/A
lf_scanned: 0
lf_repaired: 0
lf_failed: 0

And, dumping the logs from the ring buffer i see:
00080000:02000400:24.0:1489665812.888068:0:35949:0:(osd_handler.c:860:osd_fid_lookup())
wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0],
rc = 0 [1]
00002000:00020000:24.0:1489665812.888083:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl())
wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665812.923388:0:40057:0:(osd_scrub.c:758:osd_scrub_post())
wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665812.923400:0:40057:0:(osd_scrub.c:1520:osd_scrub_main())
wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:24.0:1489665822.903706:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl())
wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665822.903984:0:40212:0:(osd_scrub.c:660:osd_scrub_prep())
wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665822.903992:0:40212:0:(osd_scrub.c:278:osd_scrub_file_reset())
wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665822.904016:0:40212:0:(osd_scrub.c:1510:osd_scrub_main())
wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:24.0:1489665822.904062:0:35949:0:(osd_handler.c:860:osd_fid_lookup())
wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0],
rc = 0 [1]
00002000:00020000:24.0:1489665822.904079:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl())
wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665822.940373:0:40212:0:(osd_scrub.c:758:osd_scrub_post())
wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665822.940385:0:40212:0:(osd_scrub.c:1520:osd_scrub_main())
wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:8.0:1489665832.919771:0:10464:0:(ofd_dev.c:1747:ofd_create_hdl())
wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:20.0:1489665832.920031:0:40406:0:(osd_scrub.c:660:osd_scrub_prep())
wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:20.0:1489665832.920037:0:40406:0:(osd_scrub.c:278:osd_scrub_file_reset())
wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:20.0:1489665832.920057:0:40406:0:(osd_scrub.c:1510:osd_scrub_main())
wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:8.0:1489665832.920094:0:10464:0:(osd_handler.c:860:osd_fid_lookup())
wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0],
rc = 0 [1]
00002000:00020000:8.0:1489665832.920113:0:10464:0:(ofd_dev.c:1781:ofd_create_hdl())
wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:20.0:1489665832.955088:0:40406:0:(osd_scrub.c:758:osd_scrub_post())
wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:20.0:1489665832.955101:0:40406:0:(osd_scrub.c:1520:osd_scrub_main())
wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:30.0:1489665842.935720:0:35960:0:(ofd_dev.c:1747:ofd_create_hdl())
wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665842.936008:0:40553:0:(osd_scrub.c:660:osd_scrub_prep())
wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665842.936015:0:40553:0:(osd_scrub.c:278:osd_scrub_file_reset())
wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665842.936038:0:40553:0:(osd_scrub.c:1510:osd_scrub_main())
wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:30.0:1489665842.936081:0:35960:0:(osd_handler.c:860:osd_fid_lookup())
wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0],
rc = 0 [1]
00002000:00020000:30.0:1489665842.936096:0:35960:0:(ofd_dev.c:1781:ofd_create_hdl())
wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665842.972129:0:40553:0:(osd_scrub.c:758:osd_scrub_post())
wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665842.972141:0:40553:0:(osd_scrub.c:1520:osd_scrub_main())
wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:10.0:1489665852.951770:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl())
wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:18.0:1489665852.951986:0:40838:0:(osd_scrub.c:660:osd_scrub_prep())
wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:18.0:1489665852.951992:0:40838:0:(osd_scrub.c:278:osd_scrub_file_reset())
wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:18.0:1489665852.952017:0:40838:0:(osd_scrub.c:1510:osd_scrub_main())
wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:10.0:1489665852.952060:0:35949:0:(osd_handler.c:860:osd_fid_lookup())
wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0],
rc = 0 [1]
00002000:00020000:10.0:1489665852.952089:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl())
wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:18.0:1489665852.987792:0:40838:0:(osd_scrub.c:758:osd_scrub_post())
wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:18.0:1489665852.987804:0:40838:0:(osd_scrub.c:1520:osd_scrub_main())
wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1
00002000:00080000:8.0:1489665862.967664:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl())
wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346
00100000:10000000:27.0:1489665862.967948:0:41207:0:(osd_scrub.c:660:osd_scrub_prep())
wurfs-OST001b: OI scrub prep, flags = 0x4e
00100000:10000000:27.0:1489665862.967955:0:41207:0:(osd_scrub.c:278:osd_scrub_file_reset())
wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0
00100000:10000000:27.0:1489665862.967982:0:41207:0:(osd_scrub.c:1510:osd_scrub_main())
wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12
00080000:02000400:8.0:1489665862.968024:0:35949:0:(osd_handler.c:860:osd_fid_lookup())
wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0],
rc = 0 [1]
00002000:00020000:8.0:1489665862.968040:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl())
wurfs-OST001b: unable to precreate: rc = -115
00100000:10000000:27.0:1489665863.004087:0:41207:0:(osd_scrub.c:758:osd_scrub_post())
wurfs-OST001b: OI scrub post, result = 1
00100000:10000000:27.0:1489665863.004098:0:41207:0:(osd_scrub.c:1520:osd_scrub_main())
wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1

I tried to see where that FID leads but seems that the file doesnt actually
exist;
(The customer has moved everything away from this osts)
[root at nfs01 ~]# lfs fid2path wurfs "[0x1001b0000:0x19a5c22:0x0]"
ioctl err -22: Invalid argument (22)
fid2path: error on FID [0x1001b0000:0x19a5c22:0x0]: Invalid argument

Not sure how to proceed form here

On 16 March 2017 at 11:03, Andrea del Monaco <
andrea.delmonaco at clustervision.com> wrote:

> Dear all,
>
> We are facing an issue with one OST.
> We have stopped pacemaker on the storage06 (which is the one that has that
> resource running):
> [root at storage06 log]# pcs status | grep 1b
> storage-ost001b (ocf::heartbeat:Filesystem): Started
> storage06.failover.cluster
> storage-ost001b_monitor_120000 on storage06.failover.cluster 'not running'
> (7): call=295, status=complete, exitreason='none'
> *
> And then we have tried to execute e2fsck -n /dev/mapper/ost001b.
> The e2fsck has reported nothing to be repaired.
> Today, i noticed that there are still errors and we can't create files on
> this OST:
> [Mon Mar 13 18:36:44 2017] LustreError: 42126:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Mon Mar 13 18:46:44 2017] LustreError: 35949:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Mon Mar 13 18:56:44 2017] LustreError: 26996:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Mon Mar 13 19:06:45 2017] LustreError: 26989:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 03:37:13 2017] LustreError: 26995:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 03:47:13 2017] LustreError: 44782:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 03:57:14 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 04:07:14 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 04:17:14 2017] LustreError: 26994:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 04:27:15 2017] LustreError: 27006:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 04:37:15 2017] LustreError: 27006:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 04:47:15 2017] LustreError: 35964:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> [Tue Mar 14 07:07:30 2017] LustreError: 35960:0:(ofd_dev.c:1781:ofd_create_hdl())
> wurfs-OST001b: unable to precreate: rc = -115
> Llooking at cat /usr/include/asm-generic/errno.h, seems that error refers
> to:
> #define EINPROGRESS 115 /* Operation now in progress */
> #define ESTALE 116 /* Stale file handle */
> (on some other osts we do have error 116 as well)
>
> Any idea about what to do next?
>
> I will increase the verbose and dump the logs from the ring buffer.
>
> Kind regards,
> --
>
> [image: clustervision_logo.png]
> Andrea Del Monaco
> Internal Engineer
>
>
>
> Skype: delmonaco.andrea
> andrea.delmonaco at clustervision.com
>
> ClusterVision BV
> Gyroscoopweg 56
> 1042 AC Amsterdam
> The Netherlands
> Tel: +31 20 407 7550 <+31%2020%20407%207550>
> Fax: +31 84 759 8389 <+31%2084%20759%208389>
> www.clustervision.com
>
>



-- 

[image: clustervision_logo.png]
Andrea Del Monaco
Internal Engineer



Skype: delmonaco.andrea
andrea.delmonaco at clustervision.com

ClusterVision BV
Gyroscoopweg 56
1042 AC Amsterdam
The Netherlands
Tel: +31 20 407 7550
Fax: +31 84 759 8389
www.clustervision.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20170316/499d72f4/attachment-0001.htm>


More information about the lustre-discuss mailing list