[lustre-discuss] free space on ldiskfs vs. zfs
Alexander I Kulyavtsev
aik at fnal.gov
Mon Aug 24 20:54:26 PDT 2015
Hmm,
I was assuming the question was about total space as I struggled for some time to understand why do I have 99 TB total available space per OSS, after installing zfs lustre, while ldiskfs OSTs have 120 TB on the same hardware. The 20% difference was partially (10%) accounted by different raid6 / raidz2 configuration. But I was not able to explain the other 10%.
For question in original post, I can not make 24 TB from "available" field of df output:
207 KiB "available" on his zfs lustre, 198 KiB on ldiskfs lustre.
At the same time the difference of the total space is
233548424256 -207693153280 = 25855270976 KiB = 24.09 TB.
Götz, could you please tell us what did you mean by "available" ?
Also,
in my case the output of linux df on OSS for the zfs pool looks strange:
zpool size reported as 25T (why?), and the formatted OST taking all space on this pool shows 33T:
[root at lfs1 ~]# df -h /zpla-0000 /mnt/OST0000
Filesystem Size Used Avail Use% Mounted on
zpla-0000 25T 256K 25T 1% /zpla-0000
zpla-0000/OST0000 33T 8.3T 25T 26% /mnt/OST0000
[root at lfs1 ~]#
in bytes:
[root at lfs1 ~]# df --block-size=1 /zpla-0000 /mnt/OST0000
Filesystem 1B-blocks Used Available Use% Mounted on
zpla-0000 26769344561152 262144 26769344299008 1% /zpla-0000
zpla-0000/OST0000 35582552834048 9093386076160 26489164660736 26% /mnt/OST0000
same ost reported by lustre:
[root at lfsa scripts]# lfs df
UUID 1K-blocks Used Available Use% Mounted on
lfs-MDT0000_UUID 974961920 275328 974684544 0% /mnt/lfsa[MDT:0]
lfs-OST0000_UUID 34748586752 8880259840 25868324736 26% /mnt/lfsa[OST:0]
...
Compare:
[root at lfs1 ~]# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zpla-0000 43.5T 10.9T 32.6T - 16% 24% 1.00x ONLINE -
zpla-0001 43.5T 11.0T 32.5T - 17% 25% 1.00x ONLINE -
zpla-0002 43.5T 10.8T 32.7T - 17% 24% 1.00x ONLINE -
I realize zfs reports raw disk space including parity blocks (48TB = 43.5 TiB); and everything else (like metadata, space for xattr inodes).
I can not explain the difference 40 TB (dec.) of data space (10*4TB drives) and 35,582,552,834,048 bytes shown by df for OST.
Best regards, Alex.
On Aug 24, 2015, at 7:52 PM, Christopher J. Morrone <morrone2 at llnl.gov> wrote:
> I could be wrong, but I don't think that the original poster was asking
> why the SIZE field of zpool list was wrong, but rather why the AVAIL
> space in zfs list was lower than he expected.
>
> I would find it easier to answer the question if I knew his drive count
> and drive size.
>
> Chris
>
> On 08/24/2015 02:12 PM, Alexander I Kulyavtsev wrote:
>> Same question here.
>>
>> 6TB/65TB is 11% . In our case about the same fraction was "missing."
>>
>> My speculation was, It may happen if at some point between zpool and linux the value reported in TB is interpreted as in TiB, and then converted to TB. Or unneeded conversion MB to MiB done twice, etc.
>>
>> Here is my numbers:
>> We have 12* 4TB drives per pool, it is 48 TB (decimal).
>> zpool created as raidz2 10+2.
>> zpool reports 43.5T.
>> Pool size shall be 48T=4T*12, or 40T=4T*10 (depending what zpool shows, before raiding or after raiding).
>>> From the Oracle ZFS documentation, "zpool list" returns the total space without overheads, thus 48 TB shall be reported by zpool instead of 43.5TB.
>>
>> In my case, it looked like conversion error/interpretation issue between TB and TiB:
>>
>> 48*1000*1000*1000*1000/1024/1024/1024/1024 = 43.65574568510055541992
>>
>>
>> At disk level:
>>
>> ~/sas2ircu 0 display
>>
>> Device is a Hard disk
>> Enclosure # : 2
>> Slot # : 12
>> SAS Address : 5003048-0-015a-a918
>> State : Ready (RDY)
>> Size (in MB)/(in sectors) : 3815447/7814037167
>> Manufacturer : ATA
>> Model Number : HGST HUS724040AL
>> Firmware Revision : AA70
>> Serial No : PN2334PBJPW14T
>> GUID : 5000cca23de6204b
>> Protocol : SATA
>> Drive Type : SATA_HDD
>>
>> One disk size is about 4 TB (decimal):
>>
>> 3815447*1024*1024 = 4000786153472
>> 7814037167*512 = 4000787029504
>>
>> vdev presents whole disk to zpool. There is some overhead, some space left on sdq9 .
>>
>> [root at lfs1 scripts]# head -4 /etc/zfs/vdev_id.conf
>> alias s0 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90c-lun-0
>> alias s1 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90d-lun-0
>> alias s2 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90e-lun-0
>> alias s3 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa90f-lun-0
>> ...
>> alias s12 /dev/disk/by-path/pci-0000:03:00.0-sas-0x50030480015aa918-lun-0
>> ...
>>
>> [root at lfs1 scripts]# ls -l /dev/disk/by-path/
>> ...
>> lrwxrwxrwx 1 root root 9 Jul 23 16:27 pci-0000:03:00.0-sas-0x50030480015aa918-lun-0 -> ../../sdq
>> lrwxrwxrwx 1 root root 10 Jul 23 16:27 pci-0000:03:00.0-sas-0x50030480015aa918-lun-0-part1 -> ../../sdq1
>> lrwxrwxrwx 1 root root 10 Jul 23 16:27 pci-0000:03:00.0-sas-0x50030480015aa918-lun-0-part9 -> ../../sdq9
>>
>> Pool report:
>>
>> [root at lfs1 scripts]# zpool list
>> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>> zpla-0000 43.5T 10.9T 32.6T - 16% 24% 1.00x ONLINE -
>> zpla-0001 43.5T 11.0T 32.5T - 17% 25% 1.00x ONLINE -
>> zpla-0002 43.5T 10.8T 32.7T - 17% 24% 1.00x ONLINE -
>> [root at lfs1 scripts]#
>>
>> [root at lfs1 ~]# zpool list -v zpla-0001
>> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>> zpla-0001 43.5T 11.0T 32.5T - 17% 25% 1.00x ONLINE -
>> raidz2 43.5T 11.0T 32.5T - 17% 25%
>> s12 - - - - - -
>> s13 - - - - - -
>> s14 - - - - - -
>> s15 - - - - - -
>> s16 - - - - - -
>> s17 - - - - - -
>> s18 - - - - - -
>> s19 - - - - - -
>> s20 - - - - - -
>> s21 - - - - - -
>> s22 - - - - - -
>> s23 - - - - - -
>> [root at lfs1 ~]#
>>
>> [root at lfs1 ~]# zpool get all zpla-0001
>> NAME PROPERTY VALUE SOURCE
>> zpla-0001 size 43.5T -
>> zpla-0001 capacity 25% -
>> zpla-0001 altroot - default
>> zpla-0001 health ONLINE -
>> zpla-0001 guid 5472902975201420000 default
>> zpla-0001 version - default
>> zpla-0001 bootfs - default
>> zpla-0001 delegation on default
>> zpla-0001 autoreplace off default
>> zpla-0001 cachefile - default
>> zpla-0001 failmode wait default
>> zpla-0001 listsnapshots off default
>> zpla-0001 autoexpand off default
>> zpla-0001 dedupditto 0 default
>> zpla-0001 dedupratio 1.00x -
>> zpla-0001 free 32.5T -
>> zpla-0001 allocated 11.0T -
>> zpla-0001 readonly off -
>> zpla-0001 ashift 12 local
>> zpla-0001 comment - default
>> zpla-0001 expandsize - -
>> zpla-0001 freeing 0 default
>> zpla-0001 fragmentation 17% -
>> zpla-0001 leaked 0 default
>> zpla-0001 feature at async_destroy enabled local
>> zpla-0001 feature at empty_bpobj active local
>> zpla-0001 feature at lz4_compress active local
>> zpla-0001 feature at spacemap_histogram active local
>> zpla-0001 feature at enabled_txg active local
>> zpla-0001 feature at hole_birth active local
>> zpla-0001 feature at extensible_dataset enabled local
>> zpla-0001 feature at embedded_data active local
>> zpla-0001 feature at bookmarks enabled local
>>
>> Alex.
>>
>> On Aug 19, 2015, at 8:18 AM, Götz Waschk <goetz.waschk at gmail.com> wrote:
>>
>>> Dear Lustre experts,
>>>
>>> I have configured two different Lustre instances, both using Lustre
>>> 2.5.3, one with ldiskfs on RAID-6 hardware RAID and one using ZFS and
>>> RAID-Z2, using the same type of hardware. I was wondering, why I 24 TB
>>> less space available, when I should have the same amount of parity
>>> used:
>>>
>>> # lfs df
>>> UUID 1K-blocks Used Available Use% Mounted on
>>> fs19-MDT0000_UUID 50322916 472696 46494784 1%
>>> /testlustre/fs19[MDT:0]
>>> fs19-OST0000_UUID 51923288320 12672 51923273600 0%
>>> /testlustre/fs19[OST:0]
>>> fs19-OST0001_UUID 51923288320 12672 51923273600 0%
>>> /testlustre/fs19[OST:1]
>>> fs19-OST0002_UUID 51923288320 12672 51923273600 0%
>>> /testlustre/fs19[OST:2]
>>> fs19-OST0003_UUID 51923288320 12672 51923273600 0%
>>> /testlustre/fs19[OST:3]
>>> filesystem summary: 207693153280 50688 207693094400 0% /testlustre/fs19
>>> UUID 1K-blocks Used Available Use% Mounted on
>>> fs18-MDT0000_UUID 47177700 482152 43550028 1%
>>> /lustre/fs18[MDT:0]
>>> fs18-OST0000_UUID 58387106064 6014088200 49452733560 11%
>>> /lustre/fs18[OST:0]
>>> fs18-OST0001_UUID 58387106064 5919753028 49547068928 11%
>>> /lustre/fs18[OST:1]
>>> fs18-OST0002_UUID 58387106064 5944542316 49522279640 11%
>>> /lustre/fs18[OST:2]
>>> fs18-OST0003_UUID 58387106064 5906712004 49560109952 11%
>>> /lustre/fs18[OST:3]
>>> filesystem summary: 233548424256 23785095548 198082192080 11% /lustre/fs18
>>>
>>> fs18 is using ldiskfs, while fs19 is ZFS:
>>> # zpool list
>>> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
>>> lustre-ost1 65T 18,1M 65,0T 0% 1.00x ONLINE -
>>> # zfs list
>>> NAME USED AVAIL REFER MOUNTPOINT
>>> lustre-ost1 13,6M 48,7T 311K /lustre-ost1
>>> lustre-ost1/ost1 12,4M 48,7T 12,4M /lustre-ost1/ost1
>>>
>>>
>>> Any idea on why my 6TB per OST went?
>>>
>>> Regards, Götz Waschk
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>> .
>>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list