[Lustre-discuss] Slow Directory Listing

Indivar Nair indivar.nair at techterra.in
Mon Sep 12 01:55:58 PDT 2011


Okay, was going thru the whole mail trail again and got answers to my last
set of questions from your (Jeremy's) earlier mail -

And how does the patch work then? If a request is for only 1 file stat, how
does multiple pages in readdir() help?

Ans:
'The only hope for speeding this up is probably a code change to implement
async glimpse thread or bulkstat/readdirplus where Lustre could fetch
attributes before userspace requests them so they would be locally cached.'

Plus setting 'vfs_cache_pressure=0' or to a very low number is the solution.

Thanks,

Regards,


Indivar Nair

On Mon, Sep 12, 2011 at 11:07 AM, Indivar Nair <indivar.nair at techterra.in>wrote:

> So this is how the flow is -
>
> Windows Explorer request's statistics of a single file in the dir -> Samba
> initiates a 'stat' call on the file -> MDC initiates RPC request to MDT;
> gets response -> OSC initiates an RPC to OST; gets response -> Response
> given back to stat / Samba -> Samba sends the statistics back to explorer.
>
> Hmmm..., doing this 2000 times is going to take a long time.
> And there is no way we can fix explorer to do a bulk stat request :-(.
>
> So the only option is to get Lustre to respond faster to individual
> requests.
> Is there anyway to increase the Size and TTL of file metadata cache in MDTs
> and OSTs?
> And how does the patch work then? If a request is for only 1 file stat, how
> does multiple pages in readdir() help?
>
> Regards,
>
>
> Indivar Nair
>
>
>
> On Mon, Sep 12, 2011 at 10:31 AM, Jeremy Filizetti <
> jeremy.filizetti at gmail.com> wrote:
>
>>
>> On Sep 12, 2011 12:27 AM, "Indivar Nair" <indivar.nair at techterra.in>
>> wrote:
>> >
>> > Sorry, didn't get it the first time -
>> > You say 'the program that issues the lstat/stat/fstat from userspace is
>> only inquiring about a single file at a time'. By 'the program' you mean
>> 'samba' or 'ls' in our case.
>> >
>> Correct those programs are issuing the syscalls.
>>
>> > Or is it the 'Windows Explorer' that is a triggering a 'stat' on each
>> file on the samba server?
>> >
>>
>> Win explorer is sending the requests to samba and samba just issues the
>> syscall to retrieve the information and send it back.
>>
>> > Regards,
>> >
>> >
>> > Indivar Nair
>> >
>> > On Mon, Sep 12, 2011 at 9:09 AM, Indivar Nair <
>> indivar.nair at techterra.in> wrote:
>> >>
>> >> So what you are saying is - The OSC (stat) issues the 1st RPC to the
>> 1st OST, waits for its response, then issues the 2nd RPC to the 2nd OST, so
>> on and so forth till it 'stat's all the 2000 files. That would be slow :-(.
>> >>
>> >> Why does Lustre do it this way, while everywhere else its trys to do
>> extreme parallelization?
>> >> Would patching lstat/stat/fstat to parallelize requests only when
>> accessing a Lustre store be possible?
>> >>
>> >> Regards,
>> >>
>> >>
>> >> Indivar Nair
>> >>
>> >>
>> >> On Mon, Sep 12, 2011 at 6:38 AM, Jeremy Filizetti <
>> jeremy.filizetti at gmail.com> wrote:
>> >>>
>> >>>
>> >>>> From Adrian's explanation, I gather that the OSC generates 1 RPC to
>> each OST for each file. Since there is only 1 OST in each of the 4 OSS, we
>> only get 128 simultaneous RPCs. So Listing 2000 files would only get us that
>> much speed, right?
>> >>>
>> >>>
>> >>> There is no concurrency in fetching these attributes because the
>> program that issues the lstat/stat/fstat from userspace is only inquiring
>> about a single file at a time.  So every RPC becomes a minimum of one
>> round-trip-time network latency between the client and an OSS assuming
>> statahead thread fetched MDS attributes and OSS has cached inode structures
>> (ignoring a few other small additions).  So if you have 2000 files in a
>> directory and you had an avg network latency of 150 us for a glimpse RPC
>> (which I've seen for cached inodes on the OSS) you have a best case of
>> 2000*.000150=.3 seconds.  Without cached inodes disk latency on the OSS will
>> make that time far longer and less predictable.
>> >>>
>> >>>>
>> >>>>
>> >>>> Now, each of the OST is around 4.5 TB in size. So say, we reduce the
>> disk size 1.125TB, but increase the number to 4, then we would get
>> 4OSTx32RPCs=128 RPC connections to each OSS, and 512 simultaneous RPCs
>> across the Lustre storage. Wouldn't this increase the listing speed four
>> times over?
>> >>>
>> >>>
>> >>> The only hope for speeding this up is probably a code change to
>> implement async glimpse thread or bulkstat/readdirplus where Lustre could
>> fetch attributes before userspace requests them so they would be locally
>> cached.
>> >>>
>> >>> Jeremy
>> >>
>> >>
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110912/18a24e76/attachment.htm>


More information about the lustre-discuss mailing list