[lustre-discuss] ZFS tuning for MDT/MGS

Degremont, Aurelien degremoa at amazon.com
Tue Apr 2 05:28:07 PDT 2019


This is very unlikely.
The only reason that could happened is this hardware is acknowledging I/O to Lustre that it did not really commit to disk like writeback cache, or a Lustre bug. 

Le 02/04/2019 14:11, « lustre-discuss au nom de Hans Henrik Happe » <lustre-discuss-bounces at lists.lustre.org au nom de happe at nbi.dk> a écrit :

    Isn't there a possibility that the MDS falsely tells the client that a
    transaction has been committed to disk. After that the client might not
    be able to replay, if the MDS dies.
    
    Cheers,
    Hans Henrik
    
    On 19/03/2019 21.32, Andreas Dilger wrote:
    > You would need to lose the MDS within a few seconds after the client to
    > lose filesystem operations, since the clients will replay their
    > operations if the MDS crashes, and ZFS commits the current transaction
    > every 1s, so this setting only really affects "sync" from the client. 
    > 
    > Cheers, Andreas
    > 
    > On Mar 19, 2019, at 12:43, George Melikov <mail at gmelikov.ru
    > <mailto:mail at gmelikov.ru>> wrote:
    > 
    >> Can you explain the reason about 'zfs set sync=disabled mdt0'? Are you
    >> ready to lose last transaction on that mdt during power failure? What
    >> did I miss?
    >>
    >> 14.03.2019, 01:00, "Riccardo Veraldi" <Riccardo.Veraldi at cnaf.infn.it
    >> <mailto:Riccardo.Veraldi at cnaf.infn.it>>:
    >>> these are the zfs settings I use on my MDSes
    >>>
    >>>  zfs set mountpoint=none mdt0
    >>>  zfs set sync=disabled mdt0
    >>>
    >>>  zfs set atime=off amdt0
    >>>  zfs set redundant_metadata=most mdt0
    >>>  zfs set xattr=sa mdt0
    >>>
    >>> if youor MDT partition is on a 4KB sector disk then you can use
    >>> ashift=12 when you create the filesystem but zfs is pretty smart and
    >>> in my case it recognized it automatically and used ashift=12
    >>> automatically.
    >>>
    >>> also here are the zfs kernel modules parameters i use to ahve better
    >>> performance. I use it on both MDS and OSSes
    >>>
    >>> options zfs zfs_prefetch_disable=1
    >>> options zfs zfs_txg_history=120
    >>> options zfs metaslab_debug_unload=1
    >>> #
    >>> options zfs zfs_vdev_scheduler=deadline
    >>> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
    >>> #
    >>> options zfs zfs_vdev_scrub_min_active=48
    >>> options zfs zfs_vdev_scrub_max_active=128
    >>> #options zfs zfs_vdev_sync_write_min_active=64
    >>> #options zfs zfs_vdev_sync_write_max_active=128
    >>> #
    >>> options zfs zfs_vdev_sync_write_min_active=8
    >>> options zfs zfs_vdev_sync_write_max_active=32
    >>> options zfs zfs_vdev_sync_read_min_active=8
    >>> options zfs zfs_vdev_sync_read_max_active=32
    >>> options zfs zfs_vdev_async_read_min_active=8
    >>> options zfs zfs_vdev_async_read_max_active=32
    >>> options zfs zfs_top_maxinflight=320
    >>> options zfs zfs_txg_timeout=30
    >>> options zfs zfs_dirty_data_max_percent=40
    >>> options zfs zfs_vdev_async_write_min_active=8
    >>> options zfs zfs_vdev_async_write_max_active=32
    >>>
    >>> some people may disagree with me anyway after years of trying
    >>> different options I reached this stable configuration.
    >>>
    >>> then there are a bunch of other important Lustre level optimizations
    >>> that you can do if you are looking for performance increase.
    >>>
    >>> Cheers
    >>>
    >>> Rick
    >>>
    >>> On 3/13/19 11:44 AM, Kurt Strosahl wrote:
    >>>>
    >>>> Good Afternoon,
    >>>>
    >>>>
    >>>>     I'm reviewing the zfs parameters for a new metadata system and I
    >>>> was looking to see if anyone had examples (good or bad) of zfs
    >>>> parameters?  I'm assuming that the MDT won't benefit from a
    >>>> recordsize of 1MB, and I've already set the ashift to 12.  I'm using
    >>>> an MDT/MGS made up of a stripe across mirrored ssds.
    >>>>
    >>>>
    >>>> w/r,
    >>>>
    >>>> Kurt
    >>>>
    >>>>
    >>>> _______________________________________________
    >>>> lustre-discuss mailing list
    >>>> lustre-discuss at lists.lustre.org <http:///touch/compose?to=lustre-discuss@lists.lustre.org>
    >>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.o… <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
    >>>
    >>>
    >>> _______________________________________________
    >>> lustre-discuss mailing list
    >>> lustre-discuss at lists.lustre.org
    >>> <http:///touch/compose?to=lustre-discuss@lists.lustre.org>
    >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.o…
    >>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
    >>>
    >>
    >>
    >> ____________________________________
    >> Sincerely,
    >> George Melikov
    >>
    >> _______________________________________________
    >> lustre-discuss mailing list
    >> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
    >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    > 
    > _______________________________________________
    > lustre-discuss mailing list
    > lustre-discuss at lists.lustre.org
    > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    > 
    _______________________________________________
    lustre-discuss mailing list
    lustre-discuss at lists.lustre.org
    http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    



More information about the lustre-discuss mailing list