[lustre-discuss] Announce: Lustre Systems Administration Guide

Rick Wagner rick at globus.org
Sat Nov 18 08:44:00 PST 2017


Marcin,

Thanks for sketching out the mechanisms we could use to help ensure the quality and accuracy of the documentation. If someone in the community is willing to work on any or all of these items, I will ask the OpenSFS board to cover the costs of any CI/CD cloud services that are needed. I would rather see the expertise of those doing the work targeted at improving Lustre than hosting services available for modest costs.

—Rick

> On Nov 18, 2017, at 2:47 AM, Marcin Dulak <marcin.dulak at gmail.com> wrote:
> 
> 
> 
> On Sat, Nov 18, 2017 at 4:20 AM, Stu Midgley <sdm900 at gmail.com <mailto:sdm900 at gmail.com>> wrote:
> Thank you both for the documentation.  I know how hard it is to maintain. 
> 
> I've asked that all my admin staff to read it - even if some of it doesn't directly apply to our environment.
> 
> What we would like is ell organised, comprehensive, accurate and up to date documenation.  Most of the time when I dive into the manual, or other online material, I find it isn't quite right (path's slightly wrong or outdated etc).  I also have difficulty finding all the information I want in a single location and in a logical fashon.  These aren't new issues and blight all documentation, but having the definitive source in a wiki might open it up to more transparency, greater use and thus, ultimately, being kept up to date, even if its by others outside Intel.
> 
> Documentation should be treated in the say way as code, i.e. automatically tested. This is not a new idea https://en.wikipedia.org/wiki/Software_documentation#Literate_programming <https://en.wikipedia.org/wiki/Software_documentation#Literate_programming>
> and with the access to various kinds of virtualization this is feasible now.
> There are Python projects (https://gitlab.com/ase/ase/tree/master/doc/tutorials <https://gitlab.com/ase/ase/tree/master/doc/tutorials>), that make use of this idea thanks to http://www.sphinx-doc.org <http://www.sphinx-doc.org/> which allows one to execute embedded Python commands
> during the process of building the documentation in html or pdf formats out of rst (restructured text) files.
> There is a system that stores LFS (Linux from scratch) in an xml format for extraction to be executed http://www.linuxfromscratch.org/alfs/ <http://www.linuxfromscratch.org/alfs/> https://github.com/ojab/jhalfs <https://github.com/ojab/jhalfs> but it seems not to be under a continuous automatic testing.
> However, projects like https://docs.openstack.org/install-guide/ <https://docs.openstack.org/install-guide/> suprisingly do not use this idea and it takes months to correct a small inconsistency in the documentation https://bugs.launchpad.net/keystone/+bug/1698455 <https://bugs.launchpad.net/keystone/+bug/1698455>
> 
> It is not very difficult to create a virtual setup consisting of several lustre servers in an unattended way (https://github.com/marcindulak/vagrant-lustre-tutorial-centos6 <https://github.com/marcindulak/vagrant-lustre-tutorial-centos6>) and use that
> to test the lustre documentation.
> An alternative to making the lustre documentation executable would be to abstract the basics of lustre using a supported configuration management system (is there any progress abouthttps://www.youtube.com/watch?v=WX00LQLYf2w <https://www.youtube.com/watch?v=WX00LQLYf2w> ?) and test that using the standard CI tools.
> 
> Cheers
> 
> Marcin
>  
> 
> I'd also like a section where people can post their experiences and solutions.  For example, in recent times, we have battled bad interactions with ZFS+lustre which lead to poor performance and ZFS corruption.  While we have now tuned both lustre and zfs and the bugs have mostly been fixed, the learnings, trouble shooting methods etc. should be preserved and might assist others in the future diagnose tricky problems.
> 
>  
> 
> That's my 5c.
> 
> 
> 
> On Sat, Nov 18, 2017 at 6:03 AM, Dilger, Andreas <andreas.dilger at intel.com <mailto:andreas.dilger at intel.com>> wrote:
> On Nov 16, 2017, at 22:41, Cowe, Malcolm J <malcolm.j.cowe at intel.com <mailto:malcolm.j.cowe at intel.com>> wrote:
> >
> > I am pleased to announce the availability of a new systems administration guide for the Lustre file system, which has been published to wiki.lustre.org <http://wiki.lustre.org/>. The content can be accessed directly from the front page of the wiki, or from the following URL:
> >
> > http://wiki.lustre.org/Category:Lustre_Systems_Administration <http://wiki.lustre.org/Category:Lustre_Systems_Administration>
> >
> > The guide is intended to provide comprehensive instructions for the installation and configuration of production-ready Lustre storage clusters. Topics covered:
> >
> >       • Introduction to Lustre
> >       • Lustre File System Components
> >       • Lustre Software Installation
> >       • Lustre Networking (LNet)
> >       • LNet Router Configuration
> >       • Lustre Object Storage Devices (OSDs)
> >       • Creating Lustre File System Services
> >       • Mounting a Lustre File System on Client Nodes
> >       • Starting and Stopping Lustre Services
> >       • Lustre High Availability
> >
> > Refer to the front page of the guide for the complete table of contents.
> 
> Malcolm,
> thanks so much for your work on this.  It is definitely improving the
> state of the documentation available today.
> 
> I was wondering if people have an opinion on whether we should remove
> some/all of the administration content from the Lustre Operations Manual,
> and make that more of a reference manual that contains details of
> commands, architecture, features, etc. as a second-level reference from
> the wiki admin guide?
> 
> For that matter, should we export the XML Manual into the wiki and
> leave it there?  We'd have to make sure that the wiki is being indexed
> by Google for easier searching before we could do that.
> 
> Cheers, Andreas
> 
> > In addition, for people who are new to Lustre, there is a high-level introduction to Lustre concepts, available as a PDF download:
> >
> > http://wiki.lustre.org/images/6/64/LustreArchitecture-v4.pdf <http://wiki.lustre.org/images/6/64/LustreArchitecture-v4.pdf>
> >
> >
> > Malcolm Cowe
> > High Performance Data Division
> >
> > Intel Corporation | www.intel.com <http://www.intel.com/>
> >
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
> 
> 
> 
> -- 
> Dr Stuart Midgley
> sdm900 at gmail.com <mailto:sdm900 at gmail.com>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
> 
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20171118/18bc284e/attachment.html>


More information about the lustre-discuss mailing list