[lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)
hahn at mcmaster.ca
Tue Apr 12 13:49:10 PDT 2016
One of our MDSs is crashing with the following:
BUG: unable to handle kernel paging request at 00000000deadbeef
IP: [<ffffffffa0ce0328>] iam_container_init+0x18/0x70 [osd_ldiskfs]
Oops: 0002 [#1] SMP
The MDS is running 2.5.3-RC1--PRISTINE-2.6.32-431.23.3.el6_lustre.x86_64
with about 2k clients ranging from 1.8.8 to 2.6.0
I'd appreciate any comments on where to point fingers: google doesn't
provide anything suggestive about iam_container_init.
Our problem seems to correlate with an unintentional creation of a tree
of >500M files. Some of the crashes we've had since then appeared
to be related to vm.zone_reclaim_mode=1. We also enabled quotas
right after the 500M file thing, and were thinking that inconsistent
quota records might cause this sort of crash.
But 0xdeadbeef is usually added as a canary for allocation issues;
is it used this way in Lustre?
Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
| McMaster RHPCS | hahn at mcmaster.ca | 905 525 9140 x24687
| Compute/Calcul Canada | http://www.computecanada.ca
More information about the lustre-discuss