[Lustre-discuss] NFS export patch/error questions

Tue Jan 15 16:46:33 PST 2008

Hello!

On Jan 15, 2008, at 7:29 PM, Dan wrote:

> I'm running 1.6.4.1 and applied the patches 14006, 14007 and 14008.  I
> tried to apply the other patches (14363, 14693, 14442, 14591)  
> recommended
> recently by Oleg but they are too different from the Lustre 1.6.4.1
> sources (tried 1.6.3 as well) to patch even by hand.  Any  
> suggestions on
> how to apply them?

My diffs are against later b1_6 tree, but if nobody can apply them,  
that's
obviously bad and I guess I need to make 1.6.4 port if only to get some
wider test coverage from everybody interested.
It would be better if you reference bug numbers instead of attachment  
ids,
as that would help me to find out those bugs faster.

> I receive countless thousand of errors very similar to:
> Jan 10 11:59:37 wiglaf kernel: Lustre Error: 11-0: an error occurred  
> while
> communicating with 0 at lo.  The ost_write operation failed with -28

-28 is ENOSPC which is just telling you you've run out of space on  
some of
your OSTs.

> same error also with -16
> The jobs writing to a NFS exported Lustre mount fail silently w/o  
> errors
> except what I've posted in the logs and sent to Oleg.  Several tests  
> are
> running now, more detailed errors to come.

The logs you provided indicate that your disk backed is overloaded and
takes hundreds of seconds to process i/o requests. You need to lower the
load somehow or improve disk backend performance.

Now, seeing as to how you do not have patch from bug 13371, that might
explain your hight load - due to gazillion of small requests generated
by nfs client without writev/readv support in lustre.

Bye,
     Oleg