[Lustre-discuss] Slow Directory Listing

Sun Sep 11 11:30:39 PDT 2011

Hi ...,

Thanks for the inputs Adrian, Jeremy, Michael.

>From Adrian's explanation, I gather that the OSC generates 1 RPC to each OST
for each file. Since there is only 1 OST in each of the 4 OSS, we only get
128 simultaneous RPCs. So Listing 2000 files would only get us that much
speed, right?

Now, each of the OST is around 4.5 TB in size. So say, we reduce the disk
size 1.125TB, but increase the number to 4, then we would get
4OSTx32RPCs=128 RPC connections to each OSS, and 512 simultaneous RPCs
across the Lustre storage. Wouldn't this increase the listing speed four
times over?

Currently, we have around 12GB of RAM on each OSS. I belive, we will have
increase this to accommodate the extra 3 OSTs and another 4 OSTs in case of
failover. We will also require a proportionate increase in MDS RAM too.

Is my theory right? Is there any catch to it?

Also, can 1 RPC consume more than 1 I/O thread, say like, it reads from the
buffer of one I/O and then moves to the next I/O buffer? Or is it strictly 1
RPC = 1 I/O?

Regards,

Indivar Nair

On Wed, Sep 7, 2011 at 6:40 PM, Michael Barnes <Michael.Barnes at jlab.org>wrote:

>
> Another thing to try is setting vfs_cache_pressure=0 on the OSSes and
> periodically setting it to nonzero to reclaim memory.  More details
> here:
>
>
> http://www.olcf.ornl.gov/wp-content/events/lug2011/4-14-2011/830-900_Robin_Humble_rjh.lug2011.pdf
>
> -mb
>
> On Sep 6, 2011, at 2:43 AM, Indivar Nair wrote:
>
> > Hi ...,
> >
> > I have a lustre storage that stores lots of small files i.e. hundreds to
> > thousand of 9MB image files.
> > While normal file access works fine, the directory listing is extremely
> > slow.
> > Depending on the number of files in a directory, the listing takes around
> 5
> > - 15 secs.
> >
> > I tried 'ls --color=none' and it worked fine; listed the contents
> > immediately.
> >
> > But that doesn't help my cause. I have Samba Gateway Servers, and all
> users
> > access the storage through the gateway. Double clicking on directory
> takes a
> > long long time to display.
> >
> > The cluster consist of -
> > - two DRBD Mirrored MDS Servers (Dell R610s) with 10K RPM disks
> > - four OSS Nodes (2 Node Cluster (Dell R710s) with a common storage (Dell
> > MD3200))
> >
> > The storage consists of 12 x 1TB HDDs on both arrays, in RAID 6
> > Configuration.
> >
> > What actually happens when one does a listing like this?
> > What can I do to make the listing faster?
> > Could it be an MDS issue?
> > Some site suggested that this could be caused due to '-o flock' switch.
> Is
> > it so?
> >
> > Kindly Help.
> > The storage is in Production, and this is causing a lot of issues.
> >
> > Regards,
> >
> >
> > Indivar Nair
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> --
> +-----------------------------------------------
> | Michael Barnes
> |
> | Thomas Jefferson National Accelerator Facility
> | Scientific Computing Group
> | 12000 Jefferson Ave.
> | Newport News, VA 23606
> | (757) 269-7634
> +-----------------------------------------------
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110912/655b6e98/attachment.htm>