[lustre-discuss] MDT partition getting full

Dilger, Andreas andreas.dilger at intel.com
Mon Apr 27 23:15:02 PDT 2015


#6506 (064)unknown type 10692401
#6513 (064)unknown type 10692404

These are MDS_UNLINK64_REC and MDS_SETATTR64_REC, so I'd guess that you have some OST offline and files are being deleted and logged on the MDS but the OST objects aren't being destroyed.

Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division

On 2015/04/27, 3:47 PM, "Radu Popescu" <radu.popescu at amoma.com<mailto:radu.popescu at amoma.com>> wrote:

Similar to Rick, I have: (links pasted earlier)

- exactly 62374 lines that look like a sequence:

Bit 1 of 65 not set
Bit 2 of 65 not set
Bit 3 of 65 not set
…………..
Bit 62372 of 65 not set
Bit 62373 of 65 not set
Bit 62374 of 65 not set

- then:

rec #62375 type=10692401 len=64
rec #62376 type=10692401 len=64
rec #62377 type=10692404 len=64
rec #62378 type=10692404 len=64
rec #62379 type=10692404 len=64
rec #62380 type=10692404 len=64
Header size : 8192
Time : Mon Apr 27 04:23:36 2015
Number of records: 65
Target uuid :
———————————

- and last:

#6506 (064)unknown type 10692401
#6513 (064)unknown type 10692404
#6515 (064)unknown type 10692404
#6516 (064)unknown type 10692404
#6517 (064)unknown type 10692404
#6518 (064)unknown type 10692401
#6519 (064)unknown type 10692401
#6520 (064)unknown type 10692401
#6521 (064)unknown type 10692404
#6522 (064)unknown type 10692401
#6525 (064)unknown type 10692404
#6526 (064)unknown type 10692404
#6527 (064)unknown type 10692404
#6528 (064)unknown type 10692404
#6529 (064)unknown type 10692404
#9455 (064)unknown type 10692401
#9456 (064)unknown type 10692401
#9457 (064)unknown type 10692404
#9458 (064)unknown type 10692404
#9459 (064)unknown type 10692404
#9460 (064)unknown type 10692404
#9461 (064)unknown type 10692404
#9462 (064)unknown type 10692404
#9463 (064)unknown type 10692404
#9464 (064)unknown type 10692404
#9465 (064)unknown type 10692404
#9466 (064)unknown type 10692404
#27881 (064)unknown type 10692404
#27882 (064)unknown type 10692404
#27884 (064)unknown type 10692404
#27885 (064)unknown type 10692404
#27886 (064)unknown type 10692401
#27887 (064)unknown type 10692401
#27888 (064)unknown type 10692404
#27889 (064)unknown type 10692404
#27890 (064)unknown type 10692404
#27891 (064)unknown type 10692404
#27892 (064)unknown type 10692401
#27893 (064)unknown type 10692401
#27894 (064)unknown type 10692401
#27895 (064)unknown type 10692401
#27896 (064)unknown type 10692404
#27897 (064)unknown type 10692404
#27898 (064)unknown type 10692404
#47567 (064)unknown type 10692404
#47569 (064)unknown type 10692401
#47570 (064)unknown type 10692401
#47571 (064)unknown type 10692401
#47572 (064)unknown type 10692401
#47573 (064)unknown type 10692401
#47574 (064)unknown type 10692401
#47575 (064)unknown type 10692404
#47576 (064)unknown type 10692401
#47578 (064)unknown type 10692401
#47579 (064)unknown type 10692401
#47580 (064)unknown type 10692401
#47582 (064)unknown type 10692404
#47583 (064)unknown type 10692401
#47584 (064)unknown type 10692401
#62375 (064)unknown type 10692401
#62376 (064)unknown type 10692401
#62377 (064)unknown type 10692404
#62378 (064)unknown type 10692404
#62379 (064)unknown type 10692404
#62380 (064)unknown type 10692404

So a total of 62449 lines.

Radu

On 27 Apr 2015, at 23:06, Mohr Jr, Richard Frank (Rick Mohr) <rmohr at utk.edu<mailto:rmohr at utk.edu>> wrote:


On Apr 24, 2015, at 1:34 PM, Alexander Zarochentsev <alexander.zarochentsev at seagate.com<mailto:alexander.zarochentsev at seagate.com>> wrote:

Hello,

On Thu, Apr 23, 2015 at 9:01 PM, Mohr Jr, Richard Frank (Rick Mohr)
<rmohr at utk.edu<mailto:rmohr at utk.edu>> wrote:

On Apr 23, 2015, at 1:07 PM, Colin Faber <cfaber at gmail.com<mailto:cfaber at gmail.com>> wrote:


Based on the directory structure here, this appears to be an OST. are you sure your targets are correctly named?


That is what I would have guessed until I took a look at my own MDT.  Sure enough, I have the directories /O/1/d[0-31] and each one seems to have 3 files that are about 3.5MB each (along with some other smaller ones).  Here is what one of those directories looks like:

debugfs:  ls -l /O/1/d14
16777293   40700 (2)      0      0    4096 20-Apr-2015 14:09 .
16777278   40755 (2)      0      0    4096 13-May-2014 16:51 ..
58129  100644 (1)      0      0    8256 13-May-2014 16:51 14
58162  100644 (1)      0      0    8256 13-May-2014 16:51 46
58197  100644 (1)      0      0    8256 13-May-2014 16:51 78
58237  100644 (1)      0      0   37632 13-May-2014 19:28 110
58271  100644 (1)      0      0   38464 13-May-2014 19:28 142
58305  100644 (1)      0      0   37888 13-May-2014 19:28 174
58343  100644 (1)      0      0   37184 13-May-2014 19:28 206
58396  100644 (1)      0      0   37312 13-May-2014 19:28 238
58429  100644 (1)      0      0   36160 13-May-2014 19:28 270
12824  100644 (1)      0      0   3623232 20-Apr-2015 14:09 43150
12915  100644 (1)      0      0   3800960 20-Apr-2015 14:09 43182
12954  100644 (1)      0      0   3769216 20-Apr-2015 14:09 43214

The three large files seem to have been created the last time the MDT was mounted.  The timestamps for the other smaller files coincides with the Lustre upgrade we performed last year.  But I am not sure what is contained in these files.

I re-checked this directory.  The smaller files are still there, but the files from Apr 20 are now gone.  Instead, there are several files from the past few days:

debugfs:  ls -l
16777293   40700 (2)      0      0    4096 27-Apr-2015 15:30 .
16777278   40755 (2)      0      0    4096 13-May-2014 16:51 ..
 58129  100644 (1)      0      0    8256 13-May-2014 16:51 14
 58162  100644 (1)      0      0    8256 13-May-2014 16:51 46
 58197  100644 (1)      0      0    8256 13-May-2014 16:51 78
 58237  100644 (1)      0      0   38080 13-May-2014 19:28 110
 58271  100644 (1)      0      0   38848 13-May-2014 19:28 142
 58305  100644 (1)      0      0   38272 13-May-2014 19:28 174
 58343  100644 (1)      0      0   37632 13-May-2014 19:28 206
 58396  100644 (1)      0      0   37760 13-May-2014 19:28 238
 58429  100644 (1)      0      0   36544 13-May-2014 19:28 270
   179  100644 (17)      0      0   4153280 24-Apr-2015 04:14 43278
   188  100644 (17)      0      0   4153280 24-Apr-2015 12:03 43310
   206  100644 (17)      0      0   4153280 24-Apr-2015 18:42 43246
  1304  100644 (17)      0      0   4153280 26-Apr-2015 06:47 43630
  1285  100644 (17)      0      0   4153280 25-Apr-2015 10:17 43470
   120  100644 (17)      0      0   4153280 25-Apr-2015 16:49 43502
   202  100644 (17)      0      0   4153280 26-Apr-2015 11:53 43662
   124  100644 (17)      0      0   4153280 26-Apr-2015 20:44 43694
  1327  100644 (17)      0      0   310464 27-Apr-2015 15:30 43822
  9978  100644 (17)      0      0   3396672 27-Apr-2015 13:32 43758
  9991  100644 (17)      0      0   1405952 27-Apr-2015 15:13 43790


can you do "debugfs dump" for one of those 4MB files , run llog_reader
(utility from lustre sources) over it and send the output to the list?


I dumped the file named “43278” and ran llog_reader.  I get a bunch of lines like this

...
Bit 52585 of 8 not set
Bit 52586 of 8 not set
Bit 52587 of 8 not set
Bit 52588 of 8 not set
Bit 52589 of 8 not set
Bit 52590 of 8 not set
Bit 52591 of 8 not set
Bit 52592 of 8 not set
…

Followed by lines like this:

rec #52601 type=10692404 len=64
Header size : 8192
Time : Fri Apr 24 04:14:05 2015
Number of records: 8
Target uuid :
-----------------------
#5222 (064)unknown type 10692404
#25265 (064)unknown type 10692404
#30429 (064)unknown type 10692404
#40335 (064)unknown type 10692404
#41590 (064)unknown type 10692404
#48975 (064)unknown type 10692404
#48976 (064)unknown type 10692401
#52601 (064)unknown type 10692404


--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu<http://www.nics.tennessee.edu/>

_______________________________________________
lustre-discuss mailing list
lustre-discuss at lists.lustre.org<mailto:lustre-discuss at lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



More information about the lustre-discuss mailing list