[Lustre-discuss] Problems with MDS Crashing

Andreas Dilger andreas.dilger at oracle.com
Wed May 12 14:55:23 PDT 2010


On 2010-05-12, at 15:42, Gary Brooks wrote:
> We're having trouble with our MDS server. Nothing suspicious in logs - at some point they are just not being created anymore.
> 
> The scenario is as following: we're having a MDS running on DRBD, 2 OSS and ca. 10 clients. The traffic pattern is lots of small file reads and writes.   We provision Joomla! sites.  Joomla! site has about 17000 small files.   We are writing 1 new Joomla! site every 30 seconds.    This happens all day long and does not stop.    

At 17000/30 = 570 files created per second, you should be OK as far as load goes.  I assume you have enough inodes on both the MDT and OST filesystems.

> During operation, load on MDS is around 2 (it's a 8-core machine raid 10 using a 3ware card, pretty heavily equipped and should handle much more). iostat says that there is constantly about 5 MB/s read and 100kB-7MB/s write. There are about 5000 r/w ops per second.
> 
> Then, all of the sudden the MDS stops responding, ssh sessions die and only hard restart helps. After the restart, /var/log/messages contains normal information (some timeout chit-chat).
> 
> While this happens randomly, there is an almost sure way to trigger it: issue sysctl -w lnet.debug=0 on all clients and servers, after which the file system becomes super responsive, load on MDS is still low, our gig-e link is well utilized (unlike when lnet logging is enabled) and after a few minutes MDS dies as described above.
> 
> I know that this is too little information to ask for help, but maybe you could at least tell me where to look for any information?

You need to connect up a serial console and/or something like netdump to get the actual error messages on the console when it hangs.  Doing "sysrq-p" or "sysrq-t" to see if it is stuck in some thread, if there are no error messages on the console.


Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.




More information about the lustre-discuss mailing list