[lustre-discuss] Problem on Lustre 2.7.0
Bob Ball
ball at umich.edu
Wed Jun 10 17:47:22 PDT 2015
We are running Lustre 2.7.0.
# uname -r
2.6.32-504.8.1.el6_lustre.x86_64
The combined mgsmdt load jumped up yesterday, and stayed high since,
with a couple of really outrageous peaks. Ended up power cycling, as
the mdt would not umount. It seems to be performing fine now, but while
watching logs, I am seeing a fair number of these now in /var/log/messages
2015-06-10T20:25:51-04:00 mdtmgs.aglt2.org kernel: [ 936.535168]
LustreError: 3932:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update())
umt3B-MDT0000: trying to overwrite bigger transno:on-disk: 17180113246,
new: 17180113245 replay: 0. see LU-617.
2015-06-10T20:27:24-04:00 mdtmgs.aglt2.org kernel: [ 1029.720722]
LustreError: 4038:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update())
umt3B-MDT0000: trying to overwrite bigger transno:on-disk: 17180141036,
new: 17180141035 replay: 0. see LU-617.
2015-06-10T20:33:54-04:00 mdtmgs.aglt2.org kernel: [ 1419.740272]
LustreError: 3892:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update())
umt3B-MDT0000: trying to overwrite bigger transno:on-disk: 17180255177,
new: 17180255176 replay: 0. see LU-617.
2015-06-10T20:35:38-04:00 mdtmgs.aglt2.org kernel: [ 1524.040242]
LustreError: 3926:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update())
umt3B-MDT0000: trying to overwrite bigger transno:on-disk: 17180285251,
new: 17180285250 replay: 0. see LU-617.
So, I found a couple of LU that seem relevant, but this older one best
replays the same kind of errors.
https://jira.hpdd.intel.com/browse/LU-5283
This one also popped up in a search.
https://jira.hpdd.intel.com/browse/LU-5939
It bothers me in particular because it says Critical Bug in 2.7.0,
solved for 2.8.0
What, if anything, should I be doing about this? Should I worry that I
will lose my mdt? I might not ever be able to return to my office if
that happens.
Thanks,
bob
More information about the lustre-discuss
mailing list