Help Needed: <br><br>
We're having trouble with our MDS server. Nothing suspicious in logs -
at some point they are just not being created anymore.<br>
<br>
The scenario is as following: we're having a MDS running on DRBD, 2 OSS
and ca. 10 clients. The traffic pattern is lots of small file reads and
writes. We provision Joomla! sites. Joomla! site has about 17000 small files. We are writing 1 new Joomla! site every 30 seconds. This happens all day long and does not stop. <br>
<br>
During operation, load on MDS is around 2 (it's a 8-core machine raid 10 using a 3ware card, pretty
heavily equipped and should handle much more). iostat says that there
is constantly about 5 MB/s read and 100kB-7MB/s write. There are about
5000 r/w ops per second.<br>
<br>
Then, all of the sudden the MDS stops responding, ssh sessions die and
only hard restart helps. After the restart, /var/log/messages contains
normal information (some timeout chit-chat).<br>
<br>
While this happens randomly, there is an almost sure way to trigger it:
issue sysctl -w lnet.debug=0 on all clients and servers, after which the
file system becomes super responsive, load on MDS is still low, our
gig-e link is well utilized (unlike when lnet logging is enabled) and
after a few minutes MDS dies as described above.<br>
<br>
I know that this is too little information to ask for help, but maybe
you could at least tell me where to look for any information?<br><br>Gary<br>