[lustre-discuss] lnet router lustre rpm compatibility
Alexander I Kulyavtsev
aik at fnal.gov
Mon Jun 20 15:11:17 PDT 2016
On Jun 20, 2016, at 4:00 PM, Jessica Otey <jotey at nrao.edu> wrote:
> I am in the process of preparing to upgrade a production lustre system running 1.8.9 to 2.4.3.
I would like to know router compatibility matrix too and have it published together with client/server/IB compatibility matrix.
It will be nice to have separate lnet rpm, more precisely the set of rpms to be installed on routers.
Having said that, we migrated petabyte size 1.8.9 lustre file system wtih ~150 million files to two 2.5.3 systems (2.5.3+two-three patches). Essentially migration was "cp ..." between two systems mounted with 1.8 client. Hardware was moved from old to new system as space was released and rebalanced. This holistic approach took about an o(year) on live system without shutdowns.
We did not risk to do upgrade in place so I can not comment on that.
There is no "official" recommendation on 1.8 client/2.5 server compatibility so I can not tell if this migration path will work or is safe for you.
You may search reports to this list, JLAB hit issues with hardware during similar migration and IIRC they were need to disable OST on new system; this lead to space disbalance on new system (data not written to OST > broken ost).
We are still running 1.8.9 clients and we will switch them to 2.5 or later version after the last dataset is moved and we have chance to pause production. At some point I set up up 2.8.0 client on one of the nodes and used it to rebalance OSTs on 2.5.3 system. We do checksumming, it looks like things went well.
Servers run intel GA 2.5.3 with few patches; LLNL's release 2.5.3 gave edif errors on 1.8.9 client mounting both 1.8 and 2.5.
OSTs, MDTs - all zfs.
> This current system has 2 lnet routers.
> Our plan is to perform the upgrade in 2 stages:
> 1) Upgrade the MDS and OSSes to 2.4.3, leaving clients in 1.8.9.
> 2) Upgrade all clients to 2.4.3 (thus allowing the entire system to be subsequently updated to 2.5 and beyond)
> The problem is that I can't seem to locate any information about the version requirements (if any) for lnet routers. Does the lnet router have to be the same as mds, or the same as the clients? Or both?
We still run 1.8.9 routers between 2.5.3 servers and 1.8.9 clients, will upgrade routers together with clients.
We use routers to access lustre from cluster in remote room through ethernet over fibers.
The first thing you may face: take a look on privileged port setting, the default is different in 2.5.3
options ko2iblnd require_privileged_port=0
options ko2iblnd use_privileged_port=0
You may need to change default # of credits/buffers. Defaults are different in 1.8 and 2.5 but shall be the same on same lnet.
> Any light shed on this would be most helpful.
> Jessica Otey
> System Administrator II
> North American ALMA Science Center (NAASC)
> National Radio Astronomy Observatory (NRAO)
> Charlottesville, Virginia (USA)
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
More information about the lustre-discuss