[Lustre-discuss] Adjusting stripe for 50,000+ files?

Jeremy Mann jeremy at biochem.uthscsa.edu
Thu Feb 25 09:04:25 PST 2010


We are running Lustre x86_64 2.6.22.14 version 1.6.7 with 1 MGS/MDT and 14
OSTs. The past few days I've been fighting a problem with one user who is
storing roughly 50,000 >1k files in many subdirectories in our Lustre
filesystem. Saturday I had to repair the Meta server with e2fsck and now
this morning, everything was fine until he started another batch of jobs
submitted to our PBS queue.

Lustre: laredofs-MDT0000: sending delayed replies to recovered clients
Lustre: laredofs-MDT0000: recovery complete: rc 0
Lustre: MDS laredofs-MDT0000: laredofs-OST0000_UUID now active, resetting
orphans
Lustre: MDS laredofs-MDT0000: laredofs-OST0001_UUID now active, resetting
orphans
Lustre: Client laredofs-client has started
nph-mascot.exe[3816]: segfault at 0000000000000018 rip 000000000041bb45
rsp 00007fffa412e890 error 6

The above messages lasted all week with now errors. This morning I see this:

Lustre: 22640:0:(lustre_fsfilt.h:330:fsfilt_setattr()) laredofs-MDT0000:
slow setattr 48s
Lustre: 22644:0:(lustre_fsfilt.h:229:fsfilt_start_log()) laredofs-MDT0000:
slow journal start 32s
LDISKFS-fs error (device sdb2): ldiskfs_add_entry: bad entry in directory
#12731224: inode out of bounds - offset=1900, inode=1953587812,
rec_len=204, name_len=194
Aborting journal on device sdb2.
Remounting filesystem read-only
LustreError: 22627:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) error
starting handle for op 8 (33 credits): rc -30
LustreError: 22627:0:(mds_reint.c:154:mds_finish_transno()) fsfilt_start: -30

I managed to unmount the OSTs and the MDT but there was a kernel panic
that prevented me from running e2fsck on the Meta server so I simply
rebooted it. Then I ran e2fsck and it found inode problems all associated
within his directories, luckily e2fsck fixed them.

Now, everything is back to normal and his jobs are processing.

Currently, the stripe set on his directory is 128k (this is our default
stripe). I'm curious if I need to set a smaller stripe on his directories
with those 50,000+ files.

-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672




More information about the lustre-discuss mailing list