[lustre-discuss] Problem on Lustre 2.7.0

Bob Ball ball at umich.edu
Wed Jun 10 17:47:22 PDT 2015


We are running Lustre 2.7.0.
# uname -r
2.6.32-504.8.1.el6_lustre.x86_64
The combined mgsmdt load jumped up yesterday, and stayed high since, 
with a couple of really outrageous peaks.  Ended up power cycling, as 
the mdt would not umount.  It seems to be performing fine now, but while 
watching logs, I am seeing a fair number of these now in /var/log/messages

2015-06-10T20:25:51-04:00 mdtmgs.aglt2.org kernel: [  936.535168] 
LustreError: 3932:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update()) 
umt3B-MDT0000: trying to overwrite bigger transno:on-disk: 17180113246, 
new: 17180113245 replay: 0. see LU-617.
2015-06-10T20:27:24-04:00 mdtmgs.aglt2.org kernel: [ 1029.720722] 
LustreError: 4038:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update()) 
umt3B-MDT0000: trying to overwrite bigger transno:on-disk: 17180141036, 
new: 17180141035 replay: 0. see LU-617.
2015-06-10T20:33:54-04:00 mdtmgs.aglt2.org kernel: [ 1419.740272] 
LustreError: 3892:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update()) 
umt3B-MDT0000: trying to overwrite bigger transno:on-disk: 17180255177, 
new: 17180255176 replay: 0. see LU-617.
2015-06-10T20:35:38-04:00 mdtmgs.aglt2.org kernel: [ 1524.040242] 
LustreError: 3926:0:(tgt_lastrcvd.c:800:tgt_last_rcvd_update()) 
umt3B-MDT0000: trying to overwrite bigger transno:on-disk: 17180285251, 
new: 17180285250 replay: 0. see LU-617.

So, I found a couple of LU that seem relevant, but this older one best 
replays the same kind of errors.
https://jira.hpdd.intel.com/browse/LU-5283

This one also popped up in a search.
https://jira.hpdd.intel.com/browse/LU-5939
It bothers me in particular because it says Critical Bug in 2.7.0, 
solved for 2.8.0

What, if anything, should I be doing about this?  Should I worry that I 
will lose my mdt?  I might not ever be able to return to my office if 
that happens.

Thanks,
bob




More information about the lustre-discuss mailing list