[Lustre-discuss] Slow Directory Listing

Sun Sep 11 21:26:59 PDT 2011

Sorry, didn't get it the first time -
You say 'the program that issues the lstat/stat/fstat from userspace is only
inquiring about a single file at a time'. By 'the program' you mean 'samba'
or 'ls' in our case.

Or is it the 'Windows Explorer' that is a triggering a 'stat' on each file
on the samba server?

Regards,

Indivar Nair

On Mon, Sep 12, 2011 at 9:09 AM, Indivar Nair <indivar.nair at techterra.in>wrote:

> So what you are saying is - The OSC (stat) issues the 1st RPC to the 1st
> OST, waits for its response, then issues the 2nd RPC to the 2nd OST, so on
> and so forth till it 'stat's all the 2000 files. That would be slow :-(.
>
> Why does Lustre do it this way, while everywhere else its trys to do
> extreme parallelization?
> Would patching lstat/stat/fstat to parallelize requests only when accessing
> a Lustre store be possible?
>
> Regards,
>
>
> Indivar Nair
>
>
> On Mon, Sep 12, 2011 at 6:38 AM, Jeremy Filizetti <
> jeremy.filizetti at gmail.com> wrote:
>
>>
>> From Adrian's explanation, I gather that the OSC generates 1 RPC to each
>>> OST for each file. Since there is only 1 OST in each of the 4 OSS, we only
>>> get 128 simultaneous RPCs. So Listing 2000 files would only get us that much
>>> speed, right?
>>>
>>
>> There is no concurrency in fetching these attributes because the program
>> that issues the lstat/stat/fstat from userspace is only inquiring about a
>> single file at a time.  So every RPC becomes a minimum of one
>> round-trip-time network latency between the client and an OSS assuming
>> statahead thread fetched MDS attributes and OSS has cached inode structures
>> (ignoring a few other small additions).  So if you have 2000 files in a
>> directory and you had an avg network latency of 150 us for a glimpse RPC
>> (which I've seen for cached inodes on the OSS) you have a best case of
>> 2000*.000150=.3 seconds.  Without cached inodes disk latency on the OSS will
>> make that time far longer and less predictable.
>>
>>
>>>
>>> Now, each of the OST is around 4.5 TB in size. So say, we reduce the disk
>>> size 1.125TB, but increase the number to 4, then we would get
>>> 4OSTx32RPCs=128 RPC connections to each OSS, and 512 simultaneous RPCs
>>> across the Lustre storage. Wouldn't this increase the listing speed four
>>> times over?
>>>
>>
>> The only hope for speeding this up is probably a code change to implement
>> async glimpse thread or bulkstat/readdirplus where Lustre could fetch
>> attributes before userspace requests them so they would be locally cached.
>>
>> Jeremy
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110912/e1d18273/attachment.htm>