[Lustre-discuss] Lustre, automount and EIO
Andreas Dilger
adilger at sun.com
Thu Mar 25 23:18:13 PDT 2010
On 2010-03-25, at 06:33, Stephen Willey wrote:
> We're using autofs so the Lustre may be mounted instantly before the
> command using it is run. We believe it may be because the client
> has not yet established connections to all the OSTs when mount
> returns and the following command is run.
>
> We've tried creating an automounter module based on mount_generic
> that simply puts a 1s delay in the mount, and that's reduced the
> number of errors, but they're very much still there. Putting in a
> larger delay is an option, but fairly obviously a pretty bad one.
I agree. The reason that we return from mount before the OSC devices
have established their connections is to avoid hanging the mount in
case of an unavailable OST. That said, if the OSCs are accessed
before they have a chance to complete the connection the kernel should
wait until the connection attempt has completed before returning an
error.
> Mar 25 12:26:38 rr445 automount[6457]: open_mount: (mount):cannot
> open mount module lustre (/usr/lib64/autofs/mount_lustre.so: cannot
> open shared object file: No such file or directory)
Is this message itself always part of the problem? This seems
autoconf related, and makes me wonder if automount is expecting to
access a mount_lustre.so object INSTEAD of /sbin/mount.lustre. If
that is the case it may not be doing the initial mount quite
correctly. I'm not sure of that, but it seems unusual.
> Mar 25 12:26:38 rr445 kernel: Lustre: Client epsilon-client has
> started
> Mar 25 12:26:38 rr445 kernel: LustreError: 22600:0:(file.c:
> 993:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO
It would be useful to look into the Lustre kernel debug logs for this
failure. If there was an RPC timeout during connection (e.g. if the
OST is slow to respond) then that should have produced an earlier
console error. If the above operation is failing before trying to
connect to the OST, then that should be fixed.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
More information about the lustre-discuss
mailing list