[lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)
Mark Hahn
hahn at mcmaster.ca
Tue Apr 12 15:46:32 PDT 2016
>> Our problem seems to correlate with an unintentional creation of a tree of >500M files. Some of the crashes we've had since then appeared
>> to be related to vm.zone_reclaim_mode=1. We also enabled quotas right after the 500M file thing, and were thinking that inconsistent
>> quota records might cause this sort of crash.
>
> Have you set vm.zone_reclaim_mode=0 yet? I had an issue with this on my
> file system a while back when it was set to 1.
all our existing Lustre MDSes run happily with vm.zone_reclaim_mode=0,
and making this one consistent appears to have resolved a problem
(in which one family of lustre kernel threads would appear to spin,
"perf top" showing nearly all time spent in spinlock_irq. iirc.)
might your system have had a *lot* of memory? ours tend to be
fairly modest (32-64G, dual-socket intel.)
thanks,
Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
| McMaster RHPCS | hahn at mcmaster.ca | 905 525 9140 x24687
| Compute/Calcul Canada | http://www.computecanada.ca
More information about the lustre-discuss
mailing list