[lustre-discuss] ZFS tuning for MDT/MGS

Hans Henrik Happe happe at nbi.dk
Tue Apr 2 06:41:44 PDT 2019


AFAIK, that is what sync=disabled does. It pretends syncs are commited.
It will flush after 5 seconds but there might be other output that will
stall it longer.

On 02/04/2019 14.28, Degremont, Aurelien wrote:
> This is very unlikely.
> The only reason that could happened is this hardware is acknowledging I/O to Lustre that it did not really commit to disk like writeback cache, or a Lustre bug. 
> 
> Le 02/04/2019 14:11, « lustre-discuss au nom de Hans Henrik Happe » <lustre-discuss-bounces at lists.lustre.org au nom de happe at nbi.dk> a écrit :
> 
>     Isn't there a possibility that the MDS falsely tells the client that a
>     transaction has been committed to disk. After that the client might not
>     be able to replay, if the MDS dies.
>     
>     Cheers,
>     Hans Henrik
>     
>     On 19/03/2019 21.32, Andreas Dilger wrote:
>     > You would need to lose the MDS within a few seconds after the client to
>     > lose filesystem operations, since the clients will replay their
>     > operations if the MDS crashes, and ZFS commits the current transaction
>     > every 1s, so this setting only really affects "sync" from the client. 
>     > 
>     > Cheers, Andreas
>     > 
>     > On Mar 19, 2019, at 12:43, George Melikov <mail at gmelikov.ru
>     > <mailto:mail at gmelikov.ru>> wrote:
>     > 
>     >> Can you explain the reason about 'zfs set sync=disabled mdt0'? Are you
>     >> ready to lose last transaction on that mdt during power failure? What
>     >> did I miss?
>     >>
>     >> 14.03.2019, 01:00, "Riccardo Veraldi" <Riccardo.Veraldi at cnaf.infn.it
>     >> <mailto:Riccardo.Veraldi at cnaf.infn.it>>:
>     >>> these are the zfs settings I use on my MDSes
>     >>>
>     >>>  zfs set mountpoint=none mdt0
>     >>>  zfs set sync=disabled mdt0
>     >>>
>     >>>  zfs set atime=off amdt0
>     >>>  zfs set redundant_metadata=most mdt0
>     >>>  zfs set xattr=sa mdt0
>     >>>
>     >>> if youor MDT partition is on a 4KB sector disk then you can use
>     >>> ashift=12 when you create the filesystem but zfs is pretty smart and
>     >>> in my case it recognized it automatically and used ashift=12
>     >>> automatically.
>     >>>
>     >>> also here are the zfs kernel modules parameters i use to ahve better
>     >>> performance. I use it on both MDS and OSSes
>     >>>
>     >>> options zfs zfs_prefetch_disable=1
>     >>> options zfs zfs_txg_history=120
>     >>> options zfs metaslab_debug_unload=1
>     >>> #
>     >>> options zfs zfs_vdev_scheduler=deadline
>     >>> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
>     >>> #
>     >>> options zfs zfs_vdev_scrub_min_active=48
>     >>> options zfs zfs_vdev_scrub_max_active=128
>     >>> #options zfs zfs_vdev_sync_write_min_active=64
>     >>> #options zfs zfs_vdev_sync_write_max_active=128
>     >>> #
>     >>> options zfs zfs_vdev_sync_write_min_active=8
>     >>> options zfs zfs_vdev_sync_write_max_active=32
>     >>> options zfs zfs_vdev_sync_read_min_active=8
>     >>> options zfs zfs_vdev_sync_read_max_active=32
>     >>> options zfs zfs_vdev_async_read_min_active=8
>     >>> options zfs zfs_vdev_async_read_max_active=32
>     >>> options zfs zfs_top_maxinflight=320
>     >>> options zfs zfs_txg_timeout=30
>     >>> options zfs zfs_dirty_data_max_percent=40
>     >>> options zfs zfs_vdev_async_write_min_active=8
>     >>> options zfs zfs_vdev_async_write_max_active=32
>     >>>
>     >>> some people may disagree with me anyway after years of trying
>     >>> different options I reached this stable configuration.
>     >>>
>     >>> then there are a bunch of other important Lustre level optimizations
>     >>> that you can do if you are looking for performance increase.
>     >>>
>     >>> Cheers
>     >>>
>     >>> Rick
>     >>>
>     >>> On 3/13/19 11:44 AM, Kurt Strosahl wrote:
>     >>>>
>     >>>> Good Afternoon,
>     >>>>
>     >>>>
>     >>>>     I'm reviewing the zfs parameters for a new metadata system and I
>     >>>> was looking to see if anyone had examples (good or bad) of zfs
>     >>>> parameters?  I'm assuming that the MDT won't benefit from a
>     >>>> recordsize of 1MB, and I've already set the ashift to 12.  I'm using
>     >>>> an MDT/MGS made up of a stripe across mirrored ssds.
>     >>>>
>     >>>>
>     >>>> w/r,
>     >>>>
>     >>>> Kurt
>     >>>>
>     >>>>
>     >>>> _______________________________________________
>     >>>> lustre-discuss mailing list
>     >>>> lustre-discuss at lists.lustre.org <http:///touch/compose?to=lustre-discuss@lists.lustre.org>
>     >>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.o… <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>     >>>
>     >>>
>     >>> _______________________________________________
>     >>> lustre-discuss mailing list
>     >>> lustre-discuss at lists.lustre.org
>     >>> <http:///touch/compose?to=lustre-discuss@lists.lustre.org>
>     >>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.o…
>     >>> <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
>     >>>
>     >>
>     >>
>     >> ____________________________________
>     >> Sincerely,
>     >> George Melikov
>     >>
>     >> _______________________________________________
>     >> lustre-discuss mailing list
>     >> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
>     >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>     > 
>     > _______________________________________________
>     > lustre-discuss mailing list
>     > lustre-discuss at lists.lustre.org
>     > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>     > 
>     _______________________________________________
>     lustre-discuss mailing list
>     lustre-discuss at lists.lustre.org
>     http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>     
> 


More information about the lustre-discuss mailing list