[lustre-discuss] MDT partition getting full
Radu Popescu
radu.popescu at amoma.com
Mon Apr 27 14:47:02 PDT 2015
Similar to Rick, I have: (links pasted earlier)
- exactly 62374 lines that look like a sequence:
Bit 1 of 65 not set
Bit 2 of 65 not set
Bit 3 of 65 not set
…………..
Bit 62372 of 65 not set
Bit 62373 of 65 not set
Bit 62374 of 65 not set
- then:
rec #62375 type=10692401 len=64
rec #62376 type=10692401 len=64
rec #62377 type=10692404 len=64
rec #62378 type=10692404 len=64
rec #62379 type=10692404 len=64
rec #62380 type=10692404 len=64
Header size : 8192
Time : Mon Apr 27 04:23:36 2015
Number of records: 65
Target uuid :
———————————
- and last:
#6506 (064)unknown type 10692401
#6513 (064)unknown type 10692404
#6515 (064)unknown type 10692404
#6516 (064)unknown type 10692404
#6517 (064)unknown type 10692404
#6518 (064)unknown type 10692401
#6519 (064)unknown type 10692401
#6520 (064)unknown type 10692401
#6521 (064)unknown type 10692404
#6522 (064)unknown type 10692401
#6525 (064)unknown type 10692404
#6526 (064)unknown type 10692404
#6527 (064)unknown type 10692404
#6528 (064)unknown type 10692404
#6529 (064)unknown type 10692404
#9455 (064)unknown type 10692401
#9456 (064)unknown type 10692401
#9457 (064)unknown type 10692404
#9458 (064)unknown type 10692404
#9459 (064)unknown type 10692404
#9460 (064)unknown type 10692404
#9461 (064)unknown type 10692404
#9462 (064)unknown type 10692404
#9463 (064)unknown type 10692404
#9464 (064)unknown type 10692404
#9465 (064)unknown type 10692404
#9466 (064)unknown type 10692404
#27881 (064)unknown type 10692404
#27882 (064)unknown type 10692404
#27884 (064)unknown type 10692404
#27885 (064)unknown type 10692404
#27886 (064)unknown type 10692401
#27887 (064)unknown type 10692401
#27888 (064)unknown type 10692404
#27889 (064)unknown type 10692404
#27890 (064)unknown type 10692404
#27891 (064)unknown type 10692404
#27892 (064)unknown type 10692401
#27893 (064)unknown type 10692401
#27894 (064)unknown type 10692401
#27895 (064)unknown type 10692401
#27896 (064)unknown type 10692404
#27897 (064)unknown type 10692404
#27898 (064)unknown type 10692404
#47567 (064)unknown type 10692404
#47569 (064)unknown type 10692401
#47570 (064)unknown type 10692401
#47571 (064)unknown type 10692401
#47572 (064)unknown type 10692401
#47573 (064)unknown type 10692401
#47574 (064)unknown type 10692401
#47575 (064)unknown type 10692404
#47576 (064)unknown type 10692401
#47578 (064)unknown type 10692401
#47579 (064)unknown type 10692401
#47580 (064)unknown type 10692401
#47582 (064)unknown type 10692404
#47583 (064)unknown type 10692401
#47584 (064)unknown type 10692401
#62375 (064)unknown type 10692401
#62376 (064)unknown type 10692401
#62377 (064)unknown type 10692404
#62378 (064)unknown type 10692404
#62379 (064)unknown type 10692404
#62380 (064)unknown type 10692404
So a total of 62449 lines.
Radu
> On 27 Apr 2015, at 23:06, Mohr Jr, Richard Frank (Rick Mohr) <rmohr at utk.edu> wrote:
>
>>
>> On Apr 24, 2015, at 1:34 PM, Alexander Zarochentsev <alexander.zarochentsev at seagate.com> wrote:
>>
>> Hello,
>>
>> On Thu, Apr 23, 2015 at 9:01 PM, Mohr Jr, Richard Frank (Rick Mohr)
>> <rmohr at utk.edu> wrote:
>>>
>>>> On Apr 23, 2015, at 1:07 PM, Colin Faber <cfaber at gmail.com> wrote:
>>>>
>>>>
>>>> Based on the directory structure here, this appears to be an OST. are you sure your targets are correctly named?
>>>>
>>>
>>> That is what I would have guessed until I took a look at my own MDT. Sure enough, I have the directories /O/1/d[0-31] and each one seems to have 3 files that are about 3.5MB each (along with some other smaller ones). Here is what one of those directories looks like:
>>>
>>> debugfs: ls -l /O/1/d14
>>> 16777293 40700 (2) 0 0 4096 20-Apr-2015 14:09 .
>>> 16777278 40755 (2) 0 0 4096 13-May-2014 16:51 ..
>>> 58129 100644 (1) 0 0 8256 13-May-2014 16:51 14
>>> 58162 100644 (1) 0 0 8256 13-May-2014 16:51 46
>>> 58197 100644 (1) 0 0 8256 13-May-2014 16:51 78
>>> 58237 100644 (1) 0 0 37632 13-May-2014 19:28 110
>>> 58271 100644 (1) 0 0 38464 13-May-2014 19:28 142
>>> 58305 100644 (1) 0 0 37888 13-May-2014 19:28 174
>>> 58343 100644 (1) 0 0 37184 13-May-2014 19:28 206
>>> 58396 100644 (1) 0 0 37312 13-May-2014 19:28 238
>>> 58429 100644 (1) 0 0 36160 13-May-2014 19:28 270
>>> 12824 100644 (1) 0 0 3623232 20-Apr-2015 14:09 43150
>>> 12915 100644 (1) 0 0 3800960 20-Apr-2015 14:09 43182
>>> 12954 100644 (1) 0 0 3769216 20-Apr-2015 14:09 43214
>>>
>>> The three large files seem to have been created the last time the MDT was mounted. The timestamps for the other smaller files coincides with the Lustre upgrade we performed last year. But I am not sure what is contained in these files.
>
> I re-checked this directory. The smaller files are still there, but the files from Apr 20 are now gone. Instead, there are several files from the past few days:
>
> debugfs: ls -l
> 16777293 40700 (2) 0 0 4096 27-Apr-2015 15:30 .
> 16777278 40755 (2) 0 0 4096 13-May-2014 16:51 ..
> 58129 100644 (1) 0 0 8256 13-May-2014 16:51 14
> 58162 100644 (1) 0 0 8256 13-May-2014 16:51 46
> 58197 100644 (1) 0 0 8256 13-May-2014 16:51 78
> 58237 100644 (1) 0 0 38080 13-May-2014 19:28 110
> 58271 100644 (1) 0 0 38848 13-May-2014 19:28 142
> 58305 100644 (1) 0 0 38272 13-May-2014 19:28 174
> 58343 100644 (1) 0 0 37632 13-May-2014 19:28 206
> 58396 100644 (1) 0 0 37760 13-May-2014 19:28 238
> 58429 100644 (1) 0 0 36544 13-May-2014 19:28 270
> 179 100644 (17) 0 0 4153280 24-Apr-2015 04:14 43278
> 188 100644 (17) 0 0 4153280 24-Apr-2015 12:03 43310
> 206 100644 (17) 0 0 4153280 24-Apr-2015 18:42 43246
> 1304 100644 (17) 0 0 4153280 26-Apr-2015 06:47 43630
> 1285 100644 (17) 0 0 4153280 25-Apr-2015 10:17 43470
> 120 100644 (17) 0 0 4153280 25-Apr-2015 16:49 43502
> 202 100644 (17) 0 0 4153280 26-Apr-2015 11:53 43662
> 124 100644 (17) 0 0 4153280 26-Apr-2015 20:44 43694
> 1327 100644 (17) 0 0 310464 27-Apr-2015 15:30 43822
> 9978 100644 (17) 0 0 3396672 27-Apr-2015 13:32 43758
> 9991 100644 (17) 0 0 1405952 27-Apr-2015 15:13 43790
>
>
>> can you do "debugfs dump" for one of those 4MB files , run llog_reader
>> (utility from lustre sources) over it and send the output to the list?
>>
>
> I dumped the file named “43278” and ran llog_reader. I get a bunch of lines like this
>
> ...
> Bit 52585 of 8 not set
> Bit 52586 of 8 not set
> Bit 52587 of 8 not set
> Bit 52588 of 8 not set
> Bit 52589 of 8 not set
> Bit 52590 of 8 not set
> Bit 52591 of 8 not set
> Bit 52592 of 8 not set
> …
>
> Followed by lines like this:
>
> rec #52601 type=10692404 len=64
> Header size : 8192
> Time : Fri Apr 24 04:14:05 2015
> Number of records: 8
> Target uuid :
> -----------------------
> #5222 (064)unknown type 10692404
> #25265 (064)unknown type 10692404
> #30429 (064)unknown type 10692404
> #40335 (064)unknown type 10692404
> #41590 (064)unknown type 10692404
> #48975 (064)unknown type 10692404
> #48976 (064)unknown type 10692401
> #52601 (064)unknown type 10692404
>
>
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu <http://www.nics.tennessee.edu/>
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150428/a981e7bf/attachment-0001.htm>
More information about the lustre-discuss
mailing list