[lustre-discuss] MDT partition getting full

Mohr Jr, Richard Frank (Rick Mohr) rmohr at utk.edu
Mon Apr 27 13:06:40 PDT 2015


> On Apr 24, 2015, at 1:34 PM, Alexander Zarochentsev <alexander.zarochentsev at seagate.com> wrote:
> 
> Hello,
> 
> On Thu, Apr 23, 2015 at 9:01 PM, Mohr Jr, Richard Frank (Rick Mohr)
> <rmohr at utk.edu> wrote:
>> 
>>> On Apr 23, 2015, at 1:07 PM, Colin Faber <cfaber at gmail.com> wrote:
>>> 
>>> 
>>> Based on the directory structure here, this appears to be an OST. are you sure your targets are correctly named?
>>> 
>> 
>> That is what I would have guessed until I took a look at my own MDT.  Sure enough, I have the directories /O/1/d[0-31] and each one seems to have 3 files that are about 3.5MB each (along with some other smaller ones).  Here is what one of those directories looks like:
>> 
>> debugfs:  ls -l /O/1/d14
>> 16777293   40700 (2)      0      0    4096 20-Apr-2015 14:09 .
>> 16777278   40755 (2)      0      0    4096 13-May-2014 16:51 ..
>>  58129  100644 (1)      0      0    8256 13-May-2014 16:51 14
>>  58162  100644 (1)      0      0    8256 13-May-2014 16:51 46
>>  58197  100644 (1)      0      0    8256 13-May-2014 16:51 78
>>  58237  100644 (1)      0      0   37632 13-May-2014 19:28 110
>>  58271  100644 (1)      0      0   38464 13-May-2014 19:28 142
>>  58305  100644 (1)      0      0   37888 13-May-2014 19:28 174
>>  58343  100644 (1)      0      0   37184 13-May-2014 19:28 206
>>  58396  100644 (1)      0      0   37312 13-May-2014 19:28 238
>>  58429  100644 (1)      0      0   36160 13-May-2014 19:28 270
>>  12824  100644 (1)      0      0   3623232 20-Apr-2015 14:09 43150
>>  12915  100644 (1)      0      0   3800960 20-Apr-2015 14:09 43182
>>  12954  100644 (1)      0      0   3769216 20-Apr-2015 14:09 43214
>> 
>> The three large files seem to have been created the last time the MDT was mounted.  The timestamps for the other smaller files coincides with the Lustre upgrade we performed last year.  But I am not sure what is contained in these files.

I re-checked this directory.  The smaller files are still there, but the files from Apr 20 are now gone.  Instead, there are several files from the past few days:

debugfs:  ls -l
 16777293   40700 (2)      0      0    4096 27-Apr-2015 15:30 .
 16777278   40755 (2)      0      0    4096 13-May-2014 16:51 ..
  58129  100644 (1)      0      0    8256 13-May-2014 16:51 14
  58162  100644 (1)      0      0    8256 13-May-2014 16:51 46
  58197  100644 (1)      0      0    8256 13-May-2014 16:51 78
  58237  100644 (1)      0      0   38080 13-May-2014 19:28 110
  58271  100644 (1)      0      0   38848 13-May-2014 19:28 142
  58305  100644 (1)      0      0   38272 13-May-2014 19:28 174
  58343  100644 (1)      0      0   37632 13-May-2014 19:28 206
  58396  100644 (1)      0      0   37760 13-May-2014 19:28 238
  58429  100644 (1)      0      0   36544 13-May-2014 19:28 270
    179  100644 (17)      0      0   4153280 24-Apr-2015 04:14 43278
    188  100644 (17)      0      0   4153280 24-Apr-2015 12:03 43310
    206  100644 (17)      0      0   4153280 24-Apr-2015 18:42 43246
   1304  100644 (17)      0      0   4153280 26-Apr-2015 06:47 43630
   1285  100644 (17)      0      0   4153280 25-Apr-2015 10:17 43470
    120  100644 (17)      0      0   4153280 25-Apr-2015 16:49 43502
    202  100644 (17)      0      0   4153280 26-Apr-2015 11:53 43662
    124  100644 (17)      0      0   4153280 26-Apr-2015 20:44 43694
   1327  100644 (17)      0      0   310464 27-Apr-2015 15:30 43822
   9978  100644 (17)      0      0   3396672 27-Apr-2015 13:32 43758
   9991  100644 (17)      0      0   1405952 27-Apr-2015 15:13 43790


> can you do "debugfs dump" for one of those 4MB files , run llog_reader
> (utility from lustre sources) over it and send the output to the list?
> 

I dumped the file named “43278” and ran llog_reader.  I get a bunch of lines like this

...
Bit 52585 of 8 not set
Bit 52586 of 8 not set
Bit 52587 of 8 not set
Bit 52588 of 8 not set
Bit 52589 of 8 not set
Bit 52590 of 8 not set
Bit 52591 of 8 not set
Bit 52592 of 8 not set
…

Followed by lines like this:

rec #52601 type=10692404 len=64
Header size : 8192
Time : Fri Apr 24 04:14:05 2015
Number of records: 8
Target uuid :
-----------------------
#5222 (064)unknown type 10692404
#25265 (064)unknown type 10692404
#30429 (064)unknown type 10692404
#40335 (064)unknown type 10692404
#41590 (064)unknown type 10692404
#48975 (064)unknown type 10692404
#48976 (064)unknown type 10692401
#52601 (064)unknown type 10692404


--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu



More information about the lustre-discuss mailing list