[lustre-discuss] refresh file layout error

Wed Sep 2 10:47:18 PDT 2015

I've seen this kind of error before when doing samba to do something stupid (and let's face it, that most everything with samba)   It was a locking issue I think.   Things were being changed/deleted/ (unlinked in actuality)  as the client was trying to do something with it.

 Is the Apache process or it's spawned app(s)  still working on the files in question while serving them up?
That would be my guess here.  Any chance this is across NFS?  Seen that a great deal with this error, it used to cause crashes.

Ed Wahl
OSC

________________________________
From: lustre-discuss [lustre-discuss-bounces at lists.lustre.org] on behalf of E.S. Rosenberg [esr+lustre at mail.hebrew.edu]
Sent: Wednesday, September 02, 2015 7:57 AM
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] refresh file layout error

Hi all,

I am seeing an interesting/annoying problem with lustre and am not really sure what/where to look.

When a webserver (galaxy using wsgi/apache2) tries to server (large) files stored on lustre it fails to send the full file and I see the following errors in syslog:

Sep  2 11:50:17 hm-02 kernel: LustreError: 6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout [0x200008815:0x217e:0x0] error -13.
Sep  2 11:50:17 hm-02 kernel: LustreError: 6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238 mdc close failed: rc = -13

If I try to access the files through their direct path (copying to tmp/md5sum/sha512sum) it seems to work without a problem (full file is copied and sums agree, from different nodes).

When we switched the storage backend to NFS the server worked fine, so my guess is that there is an issue with the way python tries to read from the 'disk'...

Is anyone familiar with the error above?

Thanks,
Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150902/346de991/attachment.htm>