[Lustre-discuss] high IOPS
Andreas Dilger
adilger at sun.com
Wed Dec 2 14:41:16 PST 2009
On 2009-12-02, at 12:15, Craig Tierney wrote:
> Andreas Dilger wrote:
>> On 2009-12-02, at 09:20, Francois Chassaing wrote:
>>> I have a big fundamental question :
>>> if the load that I'll put on the FS is more IOPS-intensive than
>>> throughput-intensive (because I'll access lots of medium-sized files
>>> ~5 MB from a small number of clients), should I better go Lustre or
>>> PVFS2 ?
>>
>> I don't think PVFS2 is necessarily better at IOPS than Lustre. This
>> is mostly dependent upon the storage configuration.
>>
>>> Also, if the main load is IOPS, shouldn't I oversize MDS/MDT in
>>> terms of CPU/RAM and storage perf (ie. : max of 15K SAS RAID10
>>> spindles possible) ?
>>
>> The Lustre MDS/MDT is used only at file lookup/open/close, but is not
>> involved during actual IO operations. Still, this means in your case
>> that the MDS is getting 2 RPCs (open + close, which can be done
>> asynchronously in memory) for every 5 OST RPCs (5MB read/write, which
>> happen synchronously), so the MDS will definitely need to scale but
>> not necessarily at 2/5 of the total OST size.
>>
>> Typical numbers for a high-end MDT node (16-core, 64GB of RAM, DDR
>> IB)
>> is about 8-10k creates/sec, up to 20k lookups/sec from many clients.
>>
>> Depending on the number of files you are planning to have in the
>> filesystem, I would suggest SSDs for the MDT filesystem, especially
>> if
>> you have a large working set and are doing read-mostly access.
>
> Has anyone reported results of an SSD based MDT?
We have done internal testing, and the performance for many workloads
is somewhat faster, but not a TON faster. This is because Lustre is
already doing async IO on the MDS, unlike NFS, so decent streaming IO
performance and lots of RAM meet many of the create/lookup performance
targets.
If you have a huge filesystem that is doing a lot of random lookup,
create, and unlink operations (i.e. the working set is larger than the
MDS RAM, about 4kB per file for random operations, 16M files on a 64GB
MDS) then the high IOPS rate of SSDs will definitely make a huge
difference (i.e. keeping 20k lookups/sec on DDR instead of falling to
mdt_disks * 100).
Since that isn't a common workload for our customers, we haven't done
a lot of testing in that area, but it is definitely something I'm
curious about.
>>> on the budget side, may I use asynchronous DRBD to mirror MDT
>>> (internal storage), or should I only got a good shared storage
>>> (direct or iscsi) ?
>>
>> Some people on this list have used DRBD, but we haven't tested it
>> ourselves. I _suspect_ (though have not necessarily tested this)
>> that
>> if you are using DRBD it would be possible to have lower-performance
>> storage on the backup server without significantly impacting the
>> primary server performance, if you are willing to run slower in the
>> rare case when you are failed-over to the backup.
>>
>>> Today I'm leaning towards Lustre, because I've tested it against
>>> glusterfs, and gluster performed little less good than lustre but
>>> poorly failed the bonnie++ create/delete tests. Also I didn't gave a
>>> shot at PVFS2 yet...
>>
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
> --
> Craig Tierney (craig.tierney at noaa.gov)
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list