[lustre-discuss] Lustre Sizing

Thu Jan 3 13:55:50 PST 2019

There is zfs ocf_heartbeat agent, which let you can import/export and
failover zpool with PCS.

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/ZFS

ZFS based NAS guide will be good reference, except this explains creating
NAS (export with NFS) but not lustre.

https://github.com/ewwhite/zfs-ha/wiki

when import/export from failed server to live server is done, it is
straightforward to mount zfs backed lustre ost with "mount -t lustre
<poolname>/<ostname> <mntpoint> " command. This can be integrated to above
zfs heartbeat script.

On Fri, Jan 4, 2019 at 2:42 PM ANS <ans3456 at gmail.com> wrote:

> Thank you Jongwoo Han for the detailed explanation.
>
> Can any one let me know how can i configure HA for the zfs pools in CentOS
> 7.4.
>
> As it is different from the normal lustre HA.
>
> Thanks,
> ANS
>

On Fri, Jan 4, 2019 at 2:42 PM ANS <ans3456 at gmail.com> wrote:

> Thank you Jongwoo Han for the detailed explanation.
>
> Can any one let me know how can i configure HA for the zfs pools in CentOS
> 7.4.
>
> As it is different from the normal lustre HA.
>
> Thanks,
> ANS
>
> On Wed, Jan 2, 2019 at 9:20 PM Jongwoo Han <jongwoohan at gmail.com> wrote:
>
>> 1. "zpool list" command shows zpool size as all sum of physical drives.
>> when you create raidz2 volume (which is identical to raid6) with 10 * 6TB
>> drives, the zpool size counts up to 60TB (54TiB), while usable space from
>> "lfs df" is 8 * 6TB = 48TB (about 42TiB). even "lfs df -h" whill show TiB
>> size rounded down, you will see less. try "df -H" at OSS.
>>
>> 2. zpool based MDT assign inode dynamically, unlike ext4 based MDT. The
>> total number of inodes number will start to grow as you create more files.
>> this will be clearly visible after multi million files are created. try
>> recording current inodes and compare it after creating many metadata.
>>
>> On Wed, Jan 2, 2019 at 4:06 PM ANS <ans3456 at gmail.com> wrote:
>>
>>> Thank you Jeff. I got the solution for this it is the variation in zfs
>>> rather than the lustre because of the parity considered.
>>>
>>> But the metadata should occupy the 60% for the inode creation which is
>>> not happening in the zfs compared with ext4 ldisk.
>>>
>>> Thanks,
>>> ANS
>>>
>>> On Tue, Jan 1, 2019 at 1:05 PM ANS <ans3456 at gmail.com> wrote:
>>>
>>>> Thank you Jeff. I have created the lustre on ZFS freshly and no other
>>>> is having access to it. So when mounted it on client it is showing around
>>>> 40TB variation from the actual space.
>>>>
>>>> So what could be the reason for this variation of the size.
>>>>
>>>> Thanks,
>>>> ANS
>>>>
>>>> On Tue, Jan 1, 2019 at 12:21 PM Jeff Johnson <
>>>> jeff.johnson at aeoncomputing.com> wrote:
>>>>
>>>>> Very forward versions...especially on ZFS.
>>>>>
>>>>> You build OST volumes in a pool. If no other volumes are defined in a
>>>>> pool then 100% of that pool will be available for the OST volume but the
>>>>> way ZFS works the capacity doesn’t really belong to the OST volume until
>>>>> blocks are allocated for writes. So you have a pool
>>>>> Of a known size and you’re the admin. As long as nobody else can
>>>>> create a ZFS volume in that pool then all of the capacity in that pool will
>>>>> go to the OST eventually when new writes occur. Keep in mind that the same
>>>>> pool can contain multiple snapshots (if created) so the pool is a
>>>>> “potential capacity” but that capacity could be concurrently allocated to
>>>>> OST volume writes, snapshots and other ZFS volumes (if created)
>>>>>
>>>>> —Jeff
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 31, 2018 at 22:20 ANS <ans3456 at gmail.com> wrote:
>>>>>
>>>>>> Thanks Jeff. Currently i am using
>>>>>>
>>>>>> modinfo zfs | grep version
>>>>>> version:        0.8.0-rc2
>>>>>> rhelversion:    7.4
>>>>>>
>>>>>> lfs --version
>>>>>> lfs 2.12.0
>>>>>>
>>>>>> And this is a fresh install. So is there any other possibility to
>>>>>> show the complete zpool lun has been allocated for lustre alone.
>>>>>>
>>>>>> Thanks,
>>>>>> ANS
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 1, 2019 at 11:44 AM Jeff Johnson <
>>>>>> jeff.johnson at aeoncomputing.com> wrote:
>>>>>>
>>>>>>> ANS,
>>>>>>>
>>>>>>> Lustre on top of ZFS has to estimate capacities and it’s fairly off
>>>>>>> when the OSTs are new and empty. As objects are written to OSTs and
>>>>>>> capacity is consumed it gets the sizing of capacity more accurate. At the
>>>>>>> beginning it’s so off that it appears to be an error.
>>>>>>>
>>>>>>> What version are you running? Some patches have been added to make
>>>>>>> this calculation more accurate.
>>>>>>>
>>>>>>> —Jeff
>>>>>>>
>>>>>>> On Mon, Dec 31, 2018 at 22:08 ANS <ans3456 at gmail.com> wrote:
>>>>>>>
>>>>>>>> Dear Team,
>>>>>>>>
>>>>>>>> I am trying to configure lustre with backend ZFS as file system
>>>>>>>> with 2 servers in HA. But after compiling and creating zfs pools
>>>>>>>>
>>>>>>>> zpool list
>>>>>>>> NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP
>>>>>>>> DEDUP    HEALTH  ALTROOT
>>>>>>>> lustre-data   54.5T  25.8M  54.5T        -     16.0E     0%     0%
>>>>>>>> 1.00x    ONLINE  -
>>>>>>>> lustre-data1  54.5T  25.1M  54.5T        -     16.0E     0%     0%
>>>>>>>> 1.00x    ONLINE  -
>>>>>>>> lustre-data2  54.5T  25.8M  54.5T        -     16.0E     0%     0%
>>>>>>>> 1.00x    ONLINE  -
>>>>>>>> lustre-data3  54.5T  25.8M  54.5T        -     16.0E     0%     0%
>>>>>>>> 1.00x    ONLINE  -
>>>>>>>> lustre-meta    832G  3.50M   832G        -     16.0E     0%     0%
>>>>>>>> 1.00x    ONLINE  -
>>>>>>>>
>>>>>>>> and when mounted to client
>>>>>>>>
>>>>>>>> lfs df -h
>>>>>>>> UUID                       bytes        Used   Available Use%
>>>>>>>> Mounted on
>>>>>>>> home-MDT0000_UUID         799.7G        3.2M      799.7G   0%
>>>>>>>> /home[MDT:0]
>>>>>>>> home-OST0000_UUID          39.9T       18.0M       39.9T   0%
>>>>>>>> /home[OST:0]
>>>>>>>> home-OST0001_UUID          39.9T       18.0M       39.9T   0%
>>>>>>>> /home[OST:1]
>>>>>>>> home-OST0002_UUID          39.9T       18.0M       39.9T   0%
>>>>>>>> /home[OST:2]
>>>>>>>> home-OST0003_UUID          39.9T       18.0M       39.9T   0%
>>>>>>>> /home[OST:3]
>>>>>>>>
>>>>>>>> filesystem_summary:       159.6T       72.0M      159.6T   0% /home
>>>>>>>>
>>>>>>>> So out of total 54.5TX4=218TB i am getting only 159 TB usable. So
>>>>>>>> can any one give the information regarding this.
>>>>>>>>
>>>>>>>> Also from performance prospective what are the zfs and lustre
>>>>>>>> parameters to be tuned.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks,
>>>>>>>> ANS.
>>>>>>>> _______________________________________________
>>>>>>>> lustre-discuss mailing list
>>>>>>>> lustre-discuss at lists.lustre.org
>>>>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>>>>>
>>>>>>> --
>>>>>>> ------------------------------
>>>>>>> Jeff Johnson
>>>>>>> Co-Founder
>>>>>>> Aeon Computing
>>>>>>>
>>>>>>> jeff.johnson at aeoncomputing.com
>>>>>>> www.aeoncomputing.com
>>>>>>> t: 858-412-3810 x1001   f: 858-412-3845
>>>>>>> m: 619-204-9061
>>>>>>>
>>>>>>> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
>>>>>>> <https://maps.google.com/?q=4170+Morena+Boulevard,+Suite+C+-+San+Diego,+CA+92117&entry=gmail&source=g>
>>>>>>>
>>>>>>> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> ANS.
>>>>>>
>>>>> --
>>>>> ------------------------------
>>>>> Jeff Johnson
>>>>> Co-Founder
>>>>> Aeon Computing
>>>>>
>>>>> jeff.johnson at aeoncomputing.com
>>>>> www.aeoncomputing.com
>>>>> t: 858-412-3810 x1001   f: 858-412-3845
>>>>> m: 619-204-9061
>>>>>
>>>>> 4170 Morena Boulevard, Suite C - San Diego, CA 92117
>>>>>
>>>>> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> ANS.
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>> ANS.
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>
>>
>> --
>> Jongwoo Han
>> +82-505-227-6108
>>
>
>
> --
> Thanks,
> ANS.
>

-- 
Jongwoo Han
+82-505-227-6108
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20190104/2e6af329/attachment-0001.html>