[lustre-discuss] refresh file layout error

Wed Sep 2 22:22:44 PDT 2015

On Wed, Sep 2, 2015 at 8:47 PM, Wahl, Edward <ewahl at osc.edu> wrote:

> I've seen this kind of error before when doing samba to do something
> stupid (and let's face it, that most everything with samba)   It was a
> locking issue I think.   Things were being changed/deleted/ (unlinked in
> actuality)  as the client was trying to do something with it.
>
>  Is the Apache process or it's spawned app(s)  still working on the files
> in question while serving them up?
>
Not as far as I know, these are result files that were generated days ago
(possibly more) and should be static by now....
But I'll double check with the people behind the app....

> That would be my guess here.  Any chance this is across NFS?  Seen that a
> great deal with this error, it used to cause crashes.
>
Strictly speaking it is not, but it may be because a part of the path the
server 'sees'/'knows' is a symlink to the lustre filesystem which lives on
nfs...

Thanks,
Eli

>
> Ed Wahl
> OSC
>
>
> ------------------------------
> *From:* lustre-discuss [lustre-discuss-bounces at lists.lustre.org] on
> behalf of E.S. Rosenberg [esr+lustre at mail.hebrew.edu]
> *Sent:* Wednesday, September 02, 2015 7:57 AM
> *To:* lustre-discuss at lists.lustre.org
> *Subject:* [lustre-discuss] refresh file layout error
>
> Hi all,
>
> I am seeing an interesting/annoying problem with lustre and am not really
> sure what/where to look.
>
> When a webserver (galaxy using wsgi/apache2) tries to server (large) files
> stored on lustre it fails to send the full file and I see the following
> errors in syslog:
>
> Sep  2 11:50:17 hm-02 kernel: LustreError:
> 6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout
> [0x200008815:0x217e:0x0] error -13.
> Sep  2 11:50:17 hm-02 kernel: LustreError:
> 6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238
> mdc close failed: rc = -13
>
> If I try to access the files through their direct path (copying to
> tmp/md5sum/sha512sum) it seems to work without a problem (full file is
> copied and sums agree, from different nodes).
>
> When we switched the storage backend to NFS the server worked fine, so my
> guess is that there is an issue with the way python tries to read from the
> 'disk'...
>
> Is anyone familiar with the error above?
>
> Thanks,
> Eli
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150903/bb7a1140/attachment.htm>