So this is how the flow is -<br><br>Windows Explorer request's statistics of a single file in the dir -> Samba initiates a 'stat' call on the file -> MDC initiates RPC request to MDT; gets response -> OSC initiates an RPC to OST; gets response -> Response given back to stat / Samba -> Samba sends the statistics back to explorer.<br>


<br>Hmmm..., doing this 2000 times is going to take a long time.<br>And there is no way we can fix explorer to do a bulk stat request :-(.<br><br>So the only option is to get Lustre to respond faster to individual requests.<br>


Is there anyway to increase the Size and TTL of file metadata cache in MDTs and OSTs?<br>And how does the patch work then? If a request is for only 1 file stat, how does multiple pages in readdir() help?<br><br>Regards,<br>


<br><br>Indivar Nair<br><br><br><div class="gmail_quote">On Mon, Sep 12, 2011 at 10:31 AM, Jeremy Filizetti <span dir="ltr"><<a href="mailto:jeremy.filizetti@gmail.com">jeremy.filizetti@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><p></p><div class="im"><br>

On Sep 12, 2011 12:27 AM, "Indivar Nair" <<a href="mailto:indivar.nair@techterra.in" target="_blank">indivar.nair@techterra.in</a>> wrote:<br>

><br>

> Sorry, didn't get it the first time -<br>

> You say 'the program that issues the lstat/stat/fstat from userspace is only inquiring about a single file at a time'. By 'the program' you mean 'samba' or 'ls' in our case.<br>

><br></div>

Correct those programs are issuing the syscalls.<p></p><div class="im">

<p>> Or is it the 'Windows Explorer' that is a triggering a 'stat' on each file on the samba server?<br>

></p>

</div><p>Win explorer is sending the requests to samba and samba just issues the syscall to retrieve the information and send it back.</p><div><div></div><div class="h5">

<p>> Regards,<br>

><br>

><br>

> Indivar Nair <br>

><br>

> On Mon, Sep 12, 2011 at 9:09 AM, Indivar Nair <<a href="mailto:indivar.nair@techterra.in" target="_blank">indivar.nair@techterra.in</a>> wrote:<br>

>><br>

>> So what you are saying is - The OSC (stat) issues the 1st RPC to the 1st OST, waits for its response, then issues the 2nd RPC to the 2nd OST, so on and so forth till it 'stat's all the 2000 files. That would be slow :-(.<br>


>><br>

>> Why does Lustre do it this way, while everywhere else its trys to do extreme parallelization?<br>

>> Would patching lstat/stat/fstat to parallelize requests only when accessing a Lustre store be possible?<br>

>><br>

>> Regards,<br>

>><br>

>><br>

>> Indivar Nair<br>

>><br>

>><br>

>> On Mon, Sep 12, 2011 at 6:38 AM, Jeremy Filizetti <<a href="mailto:jeremy.filizetti@gmail.com" target="_blank">jeremy.filizetti@gmail.com</a>> wrote:<br>

>>><br>

>>><br>

>>>> From Adrian's explanation, I gather that the OSC generates 1 RPC to each OST for each file. Since there is only 1 OST in each of the 4 OSS, we only get 128 simultaneous RPCs. So Listing 2000 files would only get us that much speed, right?<br>


>>><br>

>>><br>

>>> There is no concurrency in fetching these attributes because the program that issues the lstat/stat/fstat from userspace is only inquiring about a single file at a time.  So every RPC becomes a minimum of one round-trip-time network latency between the client and an OSS assuming statahead thread fetched MDS attributes and OSS has cached inode structures (ignoring a few other small additions).  So if you have 2000 files in a directory and you had an avg network latency of 150 us for a glimpse RPC (which I've seen for cached inodes on the OSS) you have a best case of 2000*.000150=.3 seconds.  Without cached inodes disk latency on the OSS will make that time far longer and less predictable.<br>


>>>  <br>

>>>><br>

>>>><br>

>>>> Now, each of the OST is around 4.5 TB in size. So say, we reduce the disk size 1.125TB, but increase the number to 4, then we would get 4OSTx32RPCs=128 RPC connections to each OSS, and 512 simultaneous RPCs across the Lustre storage. Wouldn't this increase the listing speed four times over?<br>


>>><br>

>>><br>

>>> The only hope for speeding this up is probably a code change to implement async glimpse thread or bulkstat/readdirplus where Lustre could fetch attributes before userspace requests them so they would be locally cached.<br>


>>>  <br>

>>> Jeremy<br>

>><br>

>><br>

><br>

</p>

</div></div></blockquote></div><br>