[lustre-discuss] MDT partition getting full

Radu Popescu radu.popescu at amoma.com
Mon Apr 27 14:47:02 PDT 2015


Similar to Rick, I have: (links pasted earlier)

- exactly 62374 lines that look like a sequence:

Bit 1 of 65 not set
Bit 2 of 65 not set
Bit 3 of 65 not set
…………..
Bit 62372 of 65 not set
Bit 62373 of 65 not set
Bit 62374 of 65 not set

- then:

rec #62375 type=10692401 len=64
rec #62376 type=10692401 len=64
rec #62377 type=10692404 len=64
rec #62378 type=10692404 len=64
rec #62379 type=10692404 len=64
rec #62380 type=10692404 len=64
Header size : 8192
Time : Mon Apr 27 04:23:36 2015
Number of records: 65
Target uuid :  
———————————

- and last:

#6506 (064)unknown type 10692401
#6513 (064)unknown type 10692404
#6515 (064)unknown type 10692404
#6516 (064)unknown type 10692404
#6517 (064)unknown type 10692404
#6518 (064)unknown type 10692401
#6519 (064)unknown type 10692401
#6520 (064)unknown type 10692401
#6521 (064)unknown type 10692404
#6522 (064)unknown type 10692401
#6525 (064)unknown type 10692404
#6526 (064)unknown type 10692404
#6527 (064)unknown type 10692404
#6528 (064)unknown type 10692404
#6529 (064)unknown type 10692404
#9455 (064)unknown type 10692401
#9456 (064)unknown type 10692401
#9457 (064)unknown type 10692404
#9458 (064)unknown type 10692404
#9459 (064)unknown type 10692404
#9460 (064)unknown type 10692404
#9461 (064)unknown type 10692404
#9462 (064)unknown type 10692404
#9463 (064)unknown type 10692404
#9464 (064)unknown type 10692404
#9465 (064)unknown type 10692404
#9466 (064)unknown type 10692404
#27881 (064)unknown type 10692404
#27882 (064)unknown type 10692404
#27884 (064)unknown type 10692404
#27885 (064)unknown type 10692404
#27886 (064)unknown type 10692401
#27887 (064)unknown type 10692401
#27888 (064)unknown type 10692404
#27889 (064)unknown type 10692404
#27890 (064)unknown type 10692404
#27891 (064)unknown type 10692404
#27892 (064)unknown type 10692401
#27893 (064)unknown type 10692401
#27894 (064)unknown type 10692401
#27895 (064)unknown type 10692401
#27896 (064)unknown type 10692404
#27897 (064)unknown type 10692404
#27898 (064)unknown type 10692404
#47567 (064)unknown type 10692404
#47569 (064)unknown type 10692401
#47570 (064)unknown type 10692401
#47571 (064)unknown type 10692401
#47572 (064)unknown type 10692401
#47573 (064)unknown type 10692401
#47574 (064)unknown type 10692401
#47575 (064)unknown type 10692404
#47576 (064)unknown type 10692401
#47578 (064)unknown type 10692401
#47579 (064)unknown type 10692401
#47580 (064)unknown type 10692401
#47582 (064)unknown type 10692404
#47583 (064)unknown type 10692401
#47584 (064)unknown type 10692401
#62375 (064)unknown type 10692401
#62376 (064)unknown type 10692401
#62377 (064)unknown type 10692404
#62378 (064)unknown type 10692404
#62379 (064)unknown type 10692404
#62380 (064)unknown type 10692404

So a total of 62449 lines.

Radu

> On 27 Apr 2015, at 23:06, Mohr Jr, Richard Frank (Rick Mohr) <rmohr at utk.edu> wrote:
> 
>> 
>> On Apr 24, 2015, at 1:34 PM, Alexander Zarochentsev <alexander.zarochentsev at seagate.com> wrote:
>> 
>> Hello,
>> 
>> On Thu, Apr 23, 2015 at 9:01 PM, Mohr Jr, Richard Frank (Rick Mohr)
>> <rmohr at utk.edu> wrote:
>>> 
>>>> On Apr 23, 2015, at 1:07 PM, Colin Faber <cfaber at gmail.com> wrote:
>>>> 
>>>> 
>>>> Based on the directory structure here, this appears to be an OST. are you sure your targets are correctly named?
>>>> 
>>> 
>>> That is what I would have guessed until I took a look at my own MDT.  Sure enough, I have the directories /O/1/d[0-31] and each one seems to have 3 files that are about 3.5MB each (along with some other smaller ones).  Here is what one of those directories looks like:
>>> 
>>> debugfs:  ls -l /O/1/d14
>>> 16777293   40700 (2)      0      0    4096 20-Apr-2015 14:09 .
>>> 16777278   40755 (2)      0      0    4096 13-May-2014 16:51 ..
>>> 58129  100644 (1)      0      0    8256 13-May-2014 16:51 14
>>> 58162  100644 (1)      0      0    8256 13-May-2014 16:51 46
>>> 58197  100644 (1)      0      0    8256 13-May-2014 16:51 78
>>> 58237  100644 (1)      0      0   37632 13-May-2014 19:28 110
>>> 58271  100644 (1)      0      0   38464 13-May-2014 19:28 142
>>> 58305  100644 (1)      0      0   37888 13-May-2014 19:28 174
>>> 58343  100644 (1)      0      0   37184 13-May-2014 19:28 206
>>> 58396  100644 (1)      0      0   37312 13-May-2014 19:28 238
>>> 58429  100644 (1)      0      0   36160 13-May-2014 19:28 270
>>> 12824  100644 (1)      0      0   3623232 20-Apr-2015 14:09 43150
>>> 12915  100644 (1)      0      0   3800960 20-Apr-2015 14:09 43182
>>> 12954  100644 (1)      0      0   3769216 20-Apr-2015 14:09 43214
>>> 
>>> The three large files seem to have been created the last time the MDT was mounted.  The timestamps for the other smaller files coincides with the Lustre upgrade we performed last year.  But I am not sure what is contained in these files.
> 
> I re-checked this directory.  The smaller files are still there, but the files from Apr 20 are now gone.  Instead, there are several files from the past few days:
> 
> debugfs:  ls -l
> 16777293   40700 (2)      0      0    4096 27-Apr-2015 15:30 .
> 16777278   40755 (2)      0      0    4096 13-May-2014 16:51 ..
>  58129  100644 (1)      0      0    8256 13-May-2014 16:51 14
>  58162  100644 (1)      0      0    8256 13-May-2014 16:51 46
>  58197  100644 (1)      0      0    8256 13-May-2014 16:51 78
>  58237  100644 (1)      0      0   38080 13-May-2014 19:28 110
>  58271  100644 (1)      0      0   38848 13-May-2014 19:28 142
>  58305  100644 (1)      0      0   38272 13-May-2014 19:28 174
>  58343  100644 (1)      0      0   37632 13-May-2014 19:28 206
>  58396  100644 (1)      0      0   37760 13-May-2014 19:28 238
>  58429  100644 (1)      0      0   36544 13-May-2014 19:28 270
>    179  100644 (17)      0      0   4153280 24-Apr-2015 04:14 43278
>    188  100644 (17)      0      0   4153280 24-Apr-2015 12:03 43310
>    206  100644 (17)      0      0   4153280 24-Apr-2015 18:42 43246
>   1304  100644 (17)      0      0   4153280 26-Apr-2015 06:47 43630
>   1285  100644 (17)      0      0   4153280 25-Apr-2015 10:17 43470
>    120  100644 (17)      0      0   4153280 25-Apr-2015 16:49 43502
>    202  100644 (17)      0      0   4153280 26-Apr-2015 11:53 43662
>    124  100644 (17)      0      0   4153280 26-Apr-2015 20:44 43694
>   1327  100644 (17)      0      0   310464 27-Apr-2015 15:30 43822
>   9978  100644 (17)      0      0   3396672 27-Apr-2015 13:32 43758
>   9991  100644 (17)      0      0   1405952 27-Apr-2015 15:13 43790
> 
> 
>> can you do "debugfs dump" for one of those 4MB files , run llog_reader
>> (utility from lustre sources) over it and send the output to the list?
>> 
> 
> I dumped the file named “43278” and ran llog_reader.  I get a bunch of lines like this
> 
> ...
> Bit 52585 of 8 not set
> Bit 52586 of 8 not set
> Bit 52587 of 8 not set
> Bit 52588 of 8 not set
> Bit 52589 of 8 not set
> Bit 52590 of 8 not set
> Bit 52591 of 8 not set
> Bit 52592 of 8 not set
>> 
> Followed by lines like this:
> 
> rec #52601 type=10692404 len=64
> Header size : 8192
> Time : Fri Apr 24 04:14:05 2015
> Number of records: 8
> Target uuid :
> -----------------------
> #5222 (064)unknown type 10692404
> #25265 (064)unknown type 10692404
> #30429 (064)unknown type 10692404
> #40335 (064)unknown type 10692404
> #41590 (064)unknown type 10692404
> #48975 (064)unknown type 10692404
> #48976 (064)unknown type 10692401
> #52601 (064)unknown type 10692404
> 
> 
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu <http://www.nics.tennessee.edu/>
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org <mailto:lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org <http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20150428/a981e7bf/attachment-0001.htm>


More information about the lustre-discuss mailing list