[lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Tue Apr 12 14:53:53 PDT 2016


> On Apr 12, 2016, at 4:49 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
> 
> Our problem seems to correlate with an unintentional creation of a tree of >500M files.  Some of the crashes we've had since then appeared
> to be related to vm.zone_reclaim_mode=1.  We also enabled quotas right after the 500M file thing, and were thinking that inconsistent
> quota records might cause this sort of crash.

Have you set vm.zone_reclaim_mode=0 yet?  I had an issue with this on my file system a while back when it was set to 1.

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu



More information about the lustre-discuss mailing list