[Lustre-discuss] Lustre, automount and EIO

Stephen Willey Stephen.Willey at framestore.com
Thu Mar 25 05:33:30 PDT 2010


We're seeing errors which we believe are down to automount returning too early from a Lustre mount.

We're using autofs so the Lustre may be mounted instantly before the command using it is run.  We believe it may be because the client has not yet established connections to all the OSTs when mount returns and the following command is run.

We've tried creating an automounter module based on mount_generic that simply puts a 1s delay in the mount, and that's reduced the number of errors, but they're very much still there.  Putting in a larger delay is an option, but fairly obviously a pretty bad one.

Once the filesystem is actually mounted, things will work properly, until that is, the automounter drops the mount again of course.

Pasted below are two example log excerpts where we've automounted a filesystem called /net/epsilon, then immediately tried to fopen() a file on it which gives an I/O error.

I've attached a tiny C program that can regularly replicate the issue (it happened on 16 machines when run with pdsh across a set of roughly 400 and this is fairly representative)

Any ideas or recommendations would be much appreciated,

Stephen



Mar 25 12:26:38 rr445 automount[6457]: open_mount: (mount):cannot open mount module lustre (/usr/lib64/autofs/mount_lustre.so: cannot open shared object file: No such file or directory)
Mar 25 12:26:38 rr445 kernel: Lustre: Client epsilon-client has started                                                                                                        
Mar 25 12:26:38 rr445 kernel: LustreError: 22600:0:(file.c:993:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO


Mar 25 12:26:37 rr447 automount[6458]: open_mount: (mount):cannot open mount module lustre (/usr/lib64/autofs/mount_lustre.so: cannot open shared object file: No such file or directory)                                                                                                                                                                     
Mar 25 12:26:37 rr447 kernel: Lustre: Client epsilon-client has started                                                                                                        
Mar 25 12:26:37 rr447 kernel: LustreError: 2370:0:(file.c:993:ll_glimpse_size()) obd_enqueue returned rc -5, returning -EIO


-- 
Stephen Willey
Senior Systems Engineer
Framestore
19-23 Wells Street, London W1T 3PQ
+44 207 344 8000
www.framestore.com 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simplewriter.c
Type: text/x-csrc
Size: 664 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100325/74355289/attachment.c>


More information about the lustre-discuss mailing list