[lustre-discuss] error destroying object rc 115

Thu Mar 2 20:52:03 PST 2017

Hi,
We are running into an issue with one OST logging errors destroying objects and I am not familiar with the process of addressing this issue.   For this OST I have migrated and redistributed the objects to other OSTs so we don't lose any user data.  Below is the messages file output from the OSS server presenting the OST002d target and the errors. 

/var/log/messages from the OSS server:
Mar  2 22:07:29 oss4 kernel: LustreError: 12887:0:(ofd_dev.c:1872:ofd_destroy_hdl()) sgilfs-OST002d: error destroying object [0x1002d0000:0x41799d:0x0]: -115 
Mar  2 22:07:29 oss4 kernel: LustreError: 12887:0:(ofd_dev.c:1872:ofd_destroy_hdl()) Skipped 117 previous similar messages 
Mar  2 22:08:11 oss4 kernel: Lustre: sgilfs-OST002d-osd: FID [0x1002d0000:0x3fa284:0x0] != self_fid [0x1002d0000:0x2d1e07:0x0] 
Mar  2 22:08:11 oss4 kernel: Lustre: sgilfs-OST002d-osd: FID [0x1002d0000:0x41e662:0x0] != self_fid [0x1002d0000:0x2d1915:0x0] 
Mar  2 22:08:11 oss4 kernel: Lustre: sgilfs-OST002d-o: trigger OI scrub by RPC for [0x1002d0000:0x41e662:0x0], rc = 0 [1] 
Mar  2 22:08:53 oss4 kernel: Lustre: sgilfs-OST002d-osd: FID [0x1002d0000:0x3fa284:0x0] != self_fid [0x1002d0000:0x2d1e07:0x0] 
Mar  2 22:08:53 oss4 kernel: Lustre: sgilfs-OST002d-osd: FID [0x1002d0000:0x41e662:0x0] != self_fid [0x1002d0000:0x2d1915:0x0] 
Mar  2 22:08:53 oss4 kernel: Lustre: sgilfs-OST002d-o: trigger OI scrub by RPC for [0x1002d0000:0x41e662:0x0], rc = 0 [1] ...

{these errors keep cycling in the messages file}

>From the MDS server I initiated an LFSCK with a "-o" option so the MDT database would sync with the OST objects but oi_scrub shows no errors and the the OST oi_script cycles from completed to scanning.  

I read the available documentation but I am not clear how to determine the object oi_scrub is showing an issue.  Can someone help and provide a steps to fix this issue?

UUID                   1K-blocks        Used   Available Use% Mounted on
sgilfs-OST002d_UUID  15283534752    20228068 14470350844   0% /gbc-lustre[OST:45]

If I create a  large 50GB sequential file I can see the used capacity increase but if I remove the file the used capacity does not decrease by 50GB. 

On the MDS server oi_scrub shows no errors. 
sgilfs-MDT0000]# cat oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 21180 seconds
time_since_latest_start: 22628 seconds
time_since_last_checkpoint: 21180 seconds
latest_start_position: 137533015
last_checkpoint_position: 288358401
first_failure_position: N/A
checked: 77862246
updated: 0
failed: 0
prior_updated: 0
noscrub: 40241
igif: 1
success_count: 4
run_time: 2066 seconds
average_speed: 37687 objects/sec
real-time_speed: N/A
current_position: N/A
lf_scanned: 0
lf_reparied: 0
lf_failed: 0

oss4 sgilfs-OST002d]# cat oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 344 seconds
time_since_latest_start: 346 seconds
time_since_last_checkpoint: 344 seconds
latest_start_position: 12
last_checkpoint_position: 29868033
first_failure_position: N/A
checked: 809524
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 1
success_count: 2846929
run_time: 2 seconds
average_speed: 404762 objects/sec
real-time_speed: N/A
current_position: N/A
lf_scanned: 0
lf_reparied: 0
lf_failed: 0

Efsck returns no errors on the target device. 

We're currently using the following releases. 
Lustre release 2.7.16.11
IEEL release 3.0.1.4
OS: RHEL 7.2
e2fsprogs release 1.42.13.wc5

Any help would be greatly appreciated!

Thanks,
Scott Shaw