[lustre-discuss] ZFS file error of MDT

Sun Sep 25 19:34:30 PDT 2022

Hi Laura,
Thank you for the feedback.
I'm wondering if I could remove the corrupted file from MDT and clear the
file error. Without the file error, the Lustre storage might be started
again. I understand some files would definitely miss, but at least we have
an opportunity to recover other files back.

Best,
Ian

On Sat, Sep 24, 2022 at 5:35 AM Laura Hild <lsh at jlab.org> wrote:

> Hi Ian-
>
> It looks to me like that hardware RAID array is giving ZFS data back that
> is not what ZFS thinks it wrote.  Since from ZFS’ perspective there is no
> redundancy in the pool, only what the RAID array returns, ZFS cannot
> reconstruct the file to its satisfaction, and rather than return data that
> ZFS thinks is corrupt, it is refusing to allow that file to be accessed at
> all.  Lustre, which relies on the lower layers for redundancy, expects the
> file to be accessible, and it’s not.
>
> -Laura
>
>
> ________________________________________
> Od: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> v imenu Ian
> Yi-Feng Chang via lustre-discuss <lustre-discuss at lists.lustre.org>
> Poslano: sreda, 21. september 2022 10:53
> Za: Robert Anderson; lustre-discuss at lists.lustre.org
> Zadeva: [EXTERNAL] Re: [lustre-discuss] ZFS file error of MDT
>
> Thanks Robert for the feedback. Actually, I do not know about Lustre at
> all.
> I am also trying to contact the engineer who built the Lustre system for
> more information regarding the drive information.
> To my knowledge, the LustreMDT pool is a 4 SSD disk group (named
> /dev/mapper/SSD) with hardware RAID5.
>
> I can manually mount the LustreMDT/mdt0-work by following steps:
>
> pcs cluster standby --all (Stop MDS and OSS)
> zpool import LustreMDT
> zfs set canmount=on LustreMDT/mdt0-work
> zfs mount LustreMDT/mdt0-work
>
> Then I ls the file /LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0 it
> returned I/O error, but other files look fine.
> [root at mds1 mdt0-work]# ls -ahlt
> "/LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0"
> ls: reading directory /LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0:
> Input/output error
> total 23M
> drwxr-xr-x 2 root root 2 Jan  1  1970 .
> drwxr-xr-x 0 root root 0 Jan  1  1970 ..
>
> Is this the drive failure situation you referring to?
>
> Best,
> Ian
>
>
> On Wed, Sep 21, 2022 at 9:32 PM Robert Anderson <roberta at usnh.edu<mailto:
> roberta at usnh.edu>> wrote:
> I could be reading your zpool status output wrong, but it looks like you
> had 2 drives in that pool. Not mirrored, so no fault tolerance. Any drive
> failure would lose half of the pool data.
>
> Unless you can get that drive working you are missing half of your data
> and have no resilience to errors, nothing to recover from.
>
> However you proceed you should ensure that have a mirrored zfs pool or
> more drives and raidz (I like raidz2).
>
>
> On September 20, 2022 11:57:09 PM Ian Yi-Feng Chang via lustre-discuss <
> lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>>
> wrote:
>
> CAUTION: This email originated from outside of the University System. Do
> not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
>
> Dear All,
> I think this problem is more related to ZFS, but I would like to ask for
> help from experts in all fields.
> Our MDT cannot work properly after the IB switch was accidentally rebooted
> (power issue).
> Everything looks good except for the MDT cannot be started.
> Our MDT's ZFS didn't have a backup or snapshot.
> I would like to ask, could this problem be fixed and how to fix?
>
> Thanks for your help in advance.
>
> Best,
> Ian
>
> Lustre: Build Version: 2.10.4
> OS: CentOS Linux release 7.5.1804 (Core)
> uname -r: 3.10.0-862.el7.x86_64
>
>
> [root at mds1 etc]# pcs status
> Cluster name: mdsgroup01
> Stack: corosync
> Current DC: mds1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with
> quorum
> Last updated: Wed Sep 21 11:46:25 2022
> Last change: Wed Sep 21 11:46:13 2022 by root via cibadmin on mds1
>
> 2 nodes configured
> 9 resources configured
>
> Online: [ mds1 mds2 ]
>
> Full list of resources:
>
>  Resource Group: group-MDS
>      zfs-LustreMDT      (ocf::heartbeat:ZFS):   Started mds1
>      MGT        (ocf::lustre:Lustre):   Started mds1
>      MDT        (ocf::lustre:Lustre):   Stopped
>  ipmi-fencingMDS1       (stonith:fence_ipmilan):        Started mds2
>  ipmi-fencingMDS2       (stonith:fence_ipmilan):        Started mds2
>  Clone Set: healthLUSTRE-clone [healthLUSTRE]
>      Started: [ mds1 mds2 ]
>  Clone Set: healthLNET-clone [healthLNET]
>      Started: [ mds1 mds2 ]
>
> Failed Actions:
> * MDT_start_0 on mds1 'unknown error' (1): call=44, status=complete,
> exitreason='',
>     last-rc-change='Tue Sep 20 15:01:51 2022', queued=0ms, exec=317ms
> * MDT_start_0 on mds2 'unknown error' (1): call=48, status=complete,
> exitreason='',
>     last-rc-change='Tue Sep 20 14:38:18 2022', queued=0ms, exec=25168ms
>
>
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: active/enabled
>
>
>
> After zpool scrub MDT, the zpool status -v of MDT pool reported:
>
>   pool: LustreMDT
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://zfsonlinux.org/msg/ZFS-8000-8A
>   scan: scrub repaired 0B in 0h35m with 1 errors on Wed Sep 21 09:38:24
> 2022
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         LustreMDT   ONLINE       0     0     2
>           SSD       ONLINE       0     0     8
>
> errors: Permanent errors have been detected in the following files:
>
>         LustreMDT/mdt0-work:/oi.3/0x200000003:0x2:0x0
>
>
>
> # dmesg -T
> [Tue Sep 20 15:01:43 2022] Lustre: Lustre: Build Version: 2.10.4
> [Tue Sep 20 15:01:43 2022] LNet: Using FMR for registration
> [Tue Sep 20 15:01:43 2022] LNet: Added LNI 172.29.32.21 at o2ib [8/256/0/180]
> [Tue Sep 20 15:01:50 2022] Lustre: MGS: Connection restored to
> b5823059-e620-64ac-79f6-e5282f2fa442 (at 0 at lo)
> [Tue Sep 20 15:01:50 2022] LustreError: 3839:0:(llog.c:1296:llog_backup())
> MGC172.29.32.21 at o2ib: failed to open log work-MDT0000: rc = -5
> [Tue Sep 20 15:01:50 2022] LustreError:
> 3839:0:(mgc_request.c:1897:mgc_llog_local_copy()) MGC172.29.32.21 at o2ib:
> failed to copy remote log work-MDT0000: rc = -5
> [Tue Sep 20 15:01:50 2022] LustreError: 13a-8: Failed to get MGS log
> work-MDT0000 and no local copy.
> [Tue Sep 20 15:01:50 2022] LustreError: 15c-8: MGC172.29.32.21 at o2ib: The
> configuration from log 'work-MDT0000' failed (-2). This may be the result
> of communication errors between this node and the MGS, a bad configuration,
> or other errors. See the syslog for more information.
> [Tue Sep 20 15:01:50 2022] LustreError:
> 3839:0:(obd_mount_server.c:1386:server_start_targets()) failed to start
> server work-MDT0000: -2
> [Tue Sep 20 15:01:50 2022] LustreError:
> 3839:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start
> targets: -2
> [Tue Sep 20 15:01:50 2022] LustreError:
> 3839:0:(obd_mount_server.c:1589:server_put_super()) no obd work-MDT0000
> [Tue Sep 20 15:01:50 2022] Lustre: server umount work-MDT0000 complete
> [Tue Sep 20 15:01:50 2022] LustreError:
> 3839:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-2)
> [Tue Sep 20 15:01:56 2022] Lustre:
> 4112:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has
> timed out for slow reply: [sent 1663657311/real 1663657311]
> req at ffff8d6f0e728000 x1744471122247856/t0(0) o251->MGC172.29.32.21 at o2ib
> @0 at lo:26/25 lens 224/224 e 0 to 1 dl 1663657317 ref 2 fl
> Rpc:XN/0/ffffffff rc 0/-1
> [Tue Sep 20 15:01:56 2022] Lustre: server umount MGS complete
> [Tue Sep 20 15:02:29 2022] Lustre: MGS: Connection restored to
> b5823059-e620-64ac-79f6-e5282f2fa442 (at 0 at lo)
> [Tue Sep 20 15:02:54 2022] Lustre: MGS: Connection restored to
> 28ec81ea-0d51-d721-7be2-4f557da2546d (at 172.29.32.1 at o2ib)
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20220926/1382a89e/attachment.htm>