[lustre-devel] Lustre upstream client TODO list

Patrick Farrell paf at cray.com
Sun Feb 11 17:15:09 PST 2018


Neil,

Apologies if you've answered this elsewhere, but what's the genesis of your current (extremely welcome) interest in Lustre?  Some commitment by SUSE?

Regards,
- Patrick
________________________________
From: lustre-devel <lustre-devel-bounces at lists.lustre.org> on behalf of NeilBrown <neilb at suse.com>
Sent: Sunday, February 11, 2018 5:54:51 PM
To: James Simmons; Lustre Development List
Cc: Oleg Drokin
Subject: Re: [lustre-devel] Lustre upstream client TODO list

On Sun, Feb 11 2018, James Simmons wrote:

> So I sent a patch upstream that laid out what most needs to be done for
> the linux lustre client to leave staging. I placed the new text here for
> ease of read so you don't have to go searching for it. Feed back is
> welcomed. Hoepfully posting it will make it clear what needs to be done.


Thanks so much for putting this together and pushing it out.  I really
appreciated it and hope to show that appreciation with patches :-)

NeilBrown

>
> Currently all the work directed toward the lustre upstream client is tracked
> at the following link:
>
> https://jira.hpdd.intel.com/browse/LU-9679
>
> Under this ticket you will see the following work items that need to be
> addressed:
>
> ******************************************************************************
> * libcfs cleanup
> *
> * https://jira.hpdd.intel.com/browse/LU-9859
> *
> * Track all the cleanups and simplification of the libcfs module. Remove
> * functions the kernel provides. Possible intergrate some of the functionality
> * into the kernel proper.
> *
> ******************************************************************************
>
> https://jira.hpdd.intel.com/browse/LU-100086
>
> LNET_MINOR conflicts with USERIO_MINOR
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8130
>
> Fix and simplify libcfs hash handling
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8703
>
> The current way we handle SMP is wrong. Platforms like ARM and KNL can have
> core and NUMA setups with things like NUMA nodes with no cores. We need to
> handle such cases. This work also greatly simplified the lustre SMP code.
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9019
>
> Replace libcfs time API with standard kernel APIs. Also migrate away from
> jiffies. We found jiffies can vary on nodes which can lead to corner cases
> that can break the file system due to nodes having inconsistent behavior.
> So move to time64_t and ktime_t as much as possible.
>
> ******************************************************************************
> * Proper IB support for ko2iblnd
> ******************************************************************************
> https://jira.hpdd.intel.com/browse/LU-9179
>
> Poor performance for the ko2iblnd driver. This is related to many of the
> patches below that are missing from the linux client.
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9886
>
> Crash in upstream kiblnd_handle_early_rxs()
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-10394 / LU-10526 / LU-10089
>
> Default to default to using MEM_REG
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-10459
>
> throttle tx based on queue depth
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9943
>
> correct WR fast reg accounting
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-10291
>
> remove concurrent_sends tunable
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-10213
>
> calculate qp max_send_wrs properly
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9810
>
> use less CQ entries for each connection
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-10129 / LU-9180
>
> rework map_on_demand behavior
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-10129
>
> query device capabilities
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-10015
>
> fix race at kiblnd_connect_peer
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9983
>
> allow for discontiguous fragments
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9500
>
> Don't Page Align remote_addr with FastReg
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9448
>
> handle empty CPTs
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9507
>
> Don't Assert On Reconnect with MultiQP
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9472
>
> Fix FastReg map/unmap for MLX5
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9425
>
> Turn on 2 sges by default
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8943
>
> Enable Multiple OPA Endpoints between Nodes
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-5718
>
> multiple sges for work request
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9094
>
> kill timedout txs from ibp_tx_queue
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9094
>
> reconnect peer for REJ_INVALID_SERVICE_ID
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8752
>
> Stop MLX5 triggering a dump_cqe
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8874
>
> Move ko2iblnd to latest RDMA changes
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8875 / LU-8874
>
> Change to new RDMA done callback mechanism
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9164 / LU-8874
>
> Incorporate RDMA map/unamp API's into ko2iblnd
>
> ******************************************************************************
> * sysfs/debugfs fixes
> *
> * https://jira.hpdd.intel.com/browse/LU-8066
> *
> * The original migration to sysfs was done in haste without properly working
> * utilities to test the changes. This covers the work to restore the proper
> * behavior. Huge project to make this right.
> *
> ******************************************************************************
>
> https://jira.hpdd.intel.com/browse/LU-9431
>
> The function class_process_proc_param was used for our mass updates of proc
> tunables. It didn't work with sysfs and it was just ugly so it was removed.
> In the process the ability to mass update thousands of clients was lost. This
> work restores this in a sane way.
>
> ------------------------------------------------------------------------------
> https://jira.hpdd.intel.com/browse/LU-9091
>
> One the major request of users is the ability to pass in parameters into a
> sysfs file in various different units. For example we can set max_pages_per_rpc
> but this can vary on platforms due to different platform sizes. So you can
> set this like max_pages_per_rpc=16MiB. The original code to handle this written
> before the string helpers were created so the code doesn't follow that format
> but it would be easy to move to. Currently the string helpers does the reverse
> of what we need, changing bytes to string. We need to change a string to bytes.
>
> ******************************************************************************
> * Proper user land to kernel space interface for Lustre
> *
> * https://jira.hpdd.intel.com/browse/LU-9680
> *
> ******************************************************************************
>
> https://jira.hpdd.intel.com/browse/LU-8915
>
> Don't use linux list structure as user land arguments for lnet selftest.
> This code is pretty poor quality and really needs to be reworked.
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8834
>
> The lustre ioctl LL_IOC_FUTIMES_3 is very generic. Need to either work with
> other file systems with similar functionality and make a common syscall
> interface or rework our server code to automagically do it for us.
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-6202
>
> Cleanup up ioctl handling. We have many obsolete ioctls. Also the way we do
> ioctls can be changed over to netlink. This also has the benefit of working
> better with HPC systems that do IO forwarding. Such systems don't like ioctls
> very well.
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9667
>
> More cleanups by making our utilities use sysfs instead of ioctls for LNet.
> Also it has been requested to move the remaining ioctls to the netlink API.
>
> ******************************************************************************
> * Misc
> ******************************************************************************
>
> ------------------------------------------------------------------------------
> https://jira.hpdd.intel.com/browse/LU-9855
>
> Clean up obdclass preprocessor code. One of the major eye sores is the various
> pointer redirections and macros used by the obdclass. This makes the code very
> difficult to understand. It was requested by the Al Viro to clean this up before
> we leave staging.
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9633
>
> Migrate to sphinx kernel-doc style comments. Add documents in Documentation.
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-6142
>
> Possible remaining coding style fix. Remove deadcode. Enforce kernel code
> style. Other minor misc cleanups...
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8837
>
> Separate client/server functionality. Functions only used by server can be
> removed from client. Most of this has been done but we need a inspect of the
> code to make sure.
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-8964
>
> Lustre client readahead/writeback control needs to better suit kernel providings.
> Currently its being explored. We could end up replacing the CLIO read ahead
> abstract with the kernel proper version.
>
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9862
>
> Patch that landed for LU-7890 leads to static checker errors
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-9868
>
> dcache/namei fixes for lustre
> ------------------------------------------------------------------------------
>
> https://jira.hpdd.intel.com/browse/LU-10467
>
> use standard linux wait_events macros work by Neil Brown
>
> ------------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20180212/dec38065/attachment-0001.html>


More information about the lustre-devel mailing list