[lustre-discuss] new mounted client shows lower disk space

Raj rajgautam at gmail.com
Wed Nov 14 07:13:06 PST 2018


I would check if LNET address gets setup properly before mounting lustre FS
from client. You can try manually loading lustre module and try pinging
(lctl ping oss-nid) all the OSS nodes and observe any abnormalities and
dmesg before mounting FS.
It could be as simple as duplicate IP address in your ib interface or
unstable IB fabric.

On Wed, Nov 14, 2018 at 8:08 AM Thomas Roth <t.roth at gsi.de> wrote:

> Hi,
>
> your error messages are all well known - the one on the OSS will show up
> as soon as the Lustre modules
> are loaded, provided you have some clients asking for the OSTs (and your
> MDT, which should be up by
> then, is also looking for the OSTs).
> The kiblnd_check_conns message I have also seen quite often, never with
> any OST problems.
>
> Rather seems your OST take a lot of time to mount or to recover - did you
> check
> /proc/fs/lustre/obdfilter/lustre-OST00*/recovery_status
> ?
>
> Regards
> Thomas
>
> On 11/12/18 9:46 AM, fırat yılmaz wrote:
> > Hi All
> > OS=Redhat 7.4
> > Lustre Version: Intel® Manager for Lustre* software 4.0.3.0
> >
> > I have 72 osts over 6 oss with HA and 1 mdt serving to 195 clients over
> > infiniband EDR.
> >
> > After a reboot on client, lustre filesystem mounts on startup. It should
> be
> > 2.1 TB area but lt starts with 350TB.
> >
> > lfs osts command shows 90 percent of even numbered osts are ACTIVE and no
> > information about other OSTs, as time passes like 1 hour or so, all OSTs
> > become active and lustre area can be seen as 2.1 PB
> >
> >
> > dmesg on lustre oss server:
> > LustreError: 137-5: lustre-OST0009_UUID: not available for connect from
> > 10.0.0.130 at o2ib (no target). If you are running an HA pair check that
> the
> > target is mounted on the other server.
> >
> > dmesg on client:
> > LNet: 5419:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for
> > 10.0.0.5 at o2ib: 15 seconds
> > Lustre: 5546:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request
> sent
> > has failed due to network error: [sent 1542009416/real 1542009426]
> > req at ffff885f47610000 x1616909446641136/t0(0)
> > o8->lustre-OST0030-osc-ffff885f75219800 at 10.0.0.8@o2ib:28/4 lens 520/544
> e 0
> > to 1 dl 1542009696 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
> >
> > I tested infiniband with ib_send_lat, ib_read_lat and no error occured
> > I tested lnet ping with lctl ping 10.0.0.8 at o2ib , no error occured
> > 12345-0 at lo
> > 12345-10.51.22.8 at o2ib
> >
> > Why some OST's  can be accesible while some are not in the same server?
> > Best Regards.
> >
> >
> > _______________________________________________
> > lustre-discuss mailing list
> > lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> >
>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20181114/b4f34c89/attachment.html>


More information about the lustre-discuss mailing list