[Lustre-discuss] No space left on device for just one file
Bernd Schubert
bs_lists at aakef.fastmail.fm
Tue Jan 12 11:30:59 PST 2010
Hello Mike,
you really should fill a ticket to us (DDN). I think your problem is from
these MDS messages:
LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry: Directory index full!
LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry: Directory index full!
And /dev/dm-1 is also the scratch MDT.
Cheers,
Bernd
On Tuesday 12 January 2010, Michael Robbert wrote:
> Andreas,
> Here are the results of my debugging. This problem does show up on multiple
> (presumably all) clients. I followed your instructions, changing lustre to
> lnet in step 2, and got debug output on both machines, but the -28 text
> only showed up on the client.
>
> [root at ra 18X11]# grep -- "-28" /tmp/debug.client
> 00000100:00000200:5:1263315233.100525:0:22069:0:(client.c:841:ptlrpc_check_
> reply()) @@@ rc = 1 for req at 00000103a5820800 x200609397/t0
> o36->scratch-MDT0000_UUID at 172.16.34.1@o2ib:12/10 lens 376/424 e 0 to 1 dl
> 1263315433 ref 1 fl Rpc:R/0/0 rc 0/-28
> 00000100:00000200:5:1263315233.100538:0:22069:0:(events.c:95:reply_in_call
> back()) @@@ type 5, status 0 req at 00000103a5820800 x200609397/t0
> o36->scratch-MDT0000_UUID at 172.16.34.1@o2ib:12/10 lens 376/424 e 0 to 1 dl
> 1263315433 ref 1 fl Rpc:R/0/0 rc 0/-28
> 00000100:00100000:5:1263315233.100543:0:22069:0:(events.c:115:reply_in_cal
> lback()) @@@ unlink req at 00000103a5820800 x200609397/t0
> o36->scratch-MDT0000_UUID at 172.16.34.1@o2ib:12/10 lens 376/424 e 0 to 1 dl
> 1263315433 ref 1 fl Rpc:R/0/0 rc 0/-28
> 00000100:00000040:5:1263315233.100565:0:22069:0:(client.c:863:ptlrpc_check
> _status()) @@@ status is -28 req at 00000103a5820800 x200609397/t0
> o36->scratch-MDT0000_UUID at 172.16.34.1@o2ib:12/10 lens 376/424 e 0 to 1 dl
> 1263315433 ref 1 fl Rpc:R/0/0 rc 0/-28
> 00000100:00000001:5:1263315233.100570:0:22069:0:(client.c:869:ptlrpc_check
> _status()) Process leaving (rc=18446744073709551588 : -28 :
> ffffffffffffffe4)
> 00000100:00000001:5:1263315233.100578:0:22069:0:(client.c:955:after_reply(
> )) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
> 00000100:00100000:5:1263315233.100581:0:22069:0:(lustre_net.h:984:ptlrpc_r
> qphase_move()) @@@ move req "Rpc" -> "Interpret" req at 00000103a5820800
> x200609397/t0 o36->scratch-MDT0000_UUID at 172.16.34.1@o2ib:12/10 lens
> 376/424 e 0 to 1 dl 1263315433 ref 1 fl Rpc:R/0/0 rc 0/-28
> 00000100:00000001:5:1263315233.100586:0:22069:0:(client.c:2094:ptlrpc_queu
> e_wait()) Process leaving (rc=18446744073709551588 : -28 :
> ffffffffffffffe4)
> 00000002:00000040:5:1263315233.100590:0:22069:0:(mdc_reint.c:67:mdc_reint(
> )) error in handling -28
> 00000002:00000001:5:1263315233.100593:0:22069:0:(mdc_reint.c:227:mdc_creat
> e()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
> 00000080:00000001:5:1263315233.100596:0:22069:0:(namei.c:881:ll_new_node()
> ) Process leaving via err_exit (rc=18446744073709551588 : -28 :
> ffffffffffffffe4)
> 00000100:00000040:5:1263315233.100600:0:22069:0:(client.c:1629:__ptlrpc_re
> q_finished()) @@@ refcount now 0 req at 00000103a5820800 x200609397/t0
> o36->scratch-MDT0000_UUID at 172.16.34.1@o2ib:12/10 lens 376/424 e 0 to 1 dl
> 1263315433 ref 1 fl Interpret:R/0/0 rc 0/-28
> 00000080:00000001:5:1263315233.100620:0:22069:0:(namei.c:930:ll_mknod_gene
> ric()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
>
> Finally here is the lfs df output:
>
> [root at ra 18X11]# lfs df
> UUID 1K-blocks Used Available Use% Mounted on
> home-MDT0000_UUID 5127574032 2034740 4832512272 0%
> /lustre/home[MDT:0] home-OST0000_UUID 5768577552 1392861480 4082688968
> 24% /lustre/home[OST:0] home-OST0001_UUID 5768577552 1206861808
> 4268688824 20% /lustre/home[OST:1] home-OST0002_UUID 5768577552
> 1500109508 3975439928 26% /lustre/home[OST:2] home-OST0003_UUID
> 5768577552 1233475740 4242074712 21% /lustre/home[OST:3]
> home-OST0004_UUID 5768577552 1197398768 4278150628 20%
> /lustre/home[OST:4] home-OST0005_UUID 5768577552 1186058976 4289491656
> 20% /lustre/home[OST:5]
>
> filesystem summary: 34611465312 7716766280 25136534716 22% /lustre/home
>
> UUID 1K-blocks Used Available Use% Mounted on
> scratch-MDT0000_UUID 5127569936 9913156 4824629964 0%
> /lustre/scratch[MDT:0] scratch-OST0000_UUID 5768577552 4446029104
> 1029519960 77% /lustre/scratch[OST:0] scratch-OST0001_UUID 5768577552
> 3914730392 1560819220 67% /lustre/scratch[OST:1] scratch-OST0002_UUID
> 5768577552 4268932844 1206616396 74% /lustre/scratch[OST:2]
> scratch-OST0003_UUID 5768577552 4307085048 1168464192 74%
> /lustre/scratch[OST:3] scratch-OST0004_UUID 5768577552 3920023888
> 1555525724 67% /lustre/scratch[OST:4] scratch-OST0005_UUID 5768577552
> 3590710852 1884838760 62% /lustre/scratch[OST:5] scratch-OST0006_UUID
> 5768577552 4649048836 826500028 80% /lustre/scratch[OST:6]
> scratch-OST0007_UUID 5768577552 4089658692 1385890920 70%
> /lustre/scratch[OST:7] scratch-OST0008_UUID 5768577552 4151458292
> 1324090948 71% /lustre/scratch[OST:8] scratch-OST0009_UUID 5768577552
> 4116646240 1358902348 71% /lustre/scratch[OST:9] scratch-OST000a_UUID
> 5768577552 3750259568 1725290032 65% /lustre/scratch[OST:10]
> scratch-OST000b_UUID 5768577552 4346406836 1129141752 75%
> /lustre/scratch[OST:11] scratch-OST000c_UUID 5768577552 4376152100
> 1099396768 75% /lustre/scratch[OST:12] scratch-OST000d_UUID 5768577552
> 4312773056 1162776184 74% /lustre/scratch[OST:13] scratch-OST000e_UUID
> 5768577552 4900307080 575242532 84% /lustre/scratch[OST:14]
> scratch-OST000f_UUID 5768577552 4044304276 1431243940 70%
> /lustre/scratch[OST:15] scratch-OST0010_UUID 5768577552 3827521672
> 1648026552 66% /lustre/scratch[OST:16] scratch-OST0011_UUID 5768577552
> 3789120072 1686427400 65% /lustre/scratch[OST:17] scratch-OST0012_UUID
> 5768577552 4023497048 1452052192 69% /lustre/scratch[OST:18]
> scratch-OST0013_UUID 5768577552 4133682544 1341866324 71%
> /lustre/scratch[OST:19] scratch-OST0014_UUID 5768577552 3690021408
> 1785527832 63% /lustre/scratch[OST:20] scratch-OST0015_UUID 5768577552
> 3891559096 1583990144 67% /lustre/scratch[OST:21] scratch-OST0016_UUID
> 5768577552 4404600712 1070948896 76% /lustre/scratch[OST:22]
> scratch-OST0017_UUID 5768577552 4792223084 683326528 83%
> /lustre/scratch[OST:23] scratch-OST0018_UUID 5768577552 4486070024
> 989478844 77% /lustre/scratch[OST:24] scratch-OST0019_UUID 5768577552
> 4471754448 1003795164 77% /lustre/scratch[OST:25] scratch-OST001a_UUID
> 5768577552 4517349052 958199536 78% /lustre/scratch[OST:26]
> scratch-OST001b_UUID 5768577552 3989325372 1486223000 69%
> /lustre/scratch[OST:27] scratch-OST001c_UUID 5768577552 4024754964
> 1450793904 69% /lustre/scratch[OST:28] scratch-OST001d_UUID 5768577552
> 3883873220 1591676392 67% /lustre/scratch[OST:29] scratch-OST001e_UUID
> 5768577552 4928383088 547166152 85% /lustre/scratch[OST:30]
> scratch-OST001f_UUID 5768577552 4291418836 1184130776 74%
> /lustre/scratch[OST:31]
>
> filesystem summary: 184594481664 134329681744 40887889340 72%
> /lustre/scratch
>
> [root at ra 18X11]# lfs df -i
> UUID Inodes IUsed IFree IUse% Mounted on
> home-MDT0000_UUID 1287101228 5716405 1281384823 0%
> /lustre/home[MDT:0] home-OST0000_UUID 366288896 871143 365417753
> 0% /lustre/home[OST:0] home-OST0001_UUID 366288896 900011 365388885
> 0% /lustre/home[OST:1] home-OST0002_UUID 366288896 804892
> 365484004 0% /lustre/home[OST:2] home-OST0003_UUID 366288896
> 836213 365452683 0% /lustre/home[OST:3] home-OST0004_UUID 366288896
> 836852 365452044 0% /lustre/home[OST:4] home-OST0005_UUID
> 366288896 850446 365438450 0% /lustre/home[OST:5]
>
> filesystem summary: 1287101228 5716405 1281384823 0% /lustre/home
>
> UUID Inodes IUsed IFree IUse% Mounted on
> scratch-MDT0000_UUID 1453492963 174078773 1279414190 11%
> /lustre/scratch[MDT:0] scratch-OST0000_UUID 337257280 6621404 330635876
> 1% /lustre/scratch[OST:0] scratch-OST0001_UUID 366288896 6697629
> 359591267 1% /lustre/scratch[OST:1] scratch-OST0002_UUID 366288896
> 5272904 361015992 1% /lustre/scratch[OST:2] scratch-OST0003_UUID
> 366288896 5161903 361126993 1% /lustre/scratch[OST:3]
> scratch-OST0004_UUID 366288896 5327683 360961213 1%
> /lustre/scratch[OST:4] scratch-OST0005_UUID 366288896 5582579 360706317
> 1% /lustre/scratch[OST:5] scratch-OST0006_UUID 285040431 5158974
> 279881457 1% /lustre/scratch[OST:6] scratch-OST0007_UUID 366288896
> 5307157 360981739 1% /lustre/scratch[OST:7] scratch-OST0008_UUID
> 366288896 5387313 360901583 1% /lustre/scratch[OST:8]
> scratch-OST0009_UUID 366288896 5426523 360862373 1%
> /lustre/scratch[OST:9] scratch-OST000a_UUID 366288896 5424803 360864093
> 1% /lustre/scratch[OST:10] scratch-OST000b_UUID 360664073 5122378
> 355541695 1% /lustre/scratch[OST:11] scratch-OST000c_UUID 353235316
> 5129413 348105903 1% /lustre/scratch[OST:12] scratch-OST000d_UUID
> 366288896 5053936 361234960 1% /lustre/scratch[OST:13]
> scratch-OST000e_UUID 222189585 5122229 217067356 2%
> /lustre/scratch[OST:14] scratch-OST000f_UUID 366288896 5281196 361007700
> 1% /lustre/scratch[OST:15] scratch-OST0010_UUID 366288896 5274738
> 361014158 1% /lustre/scratch[OST:16] scratch-OST0011_UUID 366288896
> 5409560 360879336 1% /lustre/scratch[OST:17] scratch-OST0012_UUID
> 366288896 5369406 360919490 1% /lustre/scratch[OST:18]
> scratch-OST0013_UUID 366288896 5502974 360785922 1%
> /lustre/scratch[OST:19] scratch-OST0014_UUID 366288896 5521406 360767490
> 1% /lustre/scratch[OST:20] scratch-OST0015_UUID 366288896 5550606
> 360738290 1% /lustre/scratch[OST:21] scratch-OST0016_UUID 345993048
> 4999552 340993496 1% /lustre/scratch[OST:22] scratch-OST0017_UUID
> 249051056 4963064 244087992 1% /lustre/scratch[OST:23]
> scratch-OST0018_UUID 325734426 5108454 320625972 1%
> /lustre/scratch[OST:24] scratch-OST0019_UUID 329427010 5222114 324204896
> 1% /lustre/scratch[OST:25] scratch-OST001a_UUID 317921820 5115591
> 312806229 1% /lustre/scratch[OST:26] scratch-OST001b_UUID 366288896
> 5353229 360935667 1% /lustre/scratch[OST:27] scratch-OST001c_UUID
> 366288896 5383473 360905423 1% /lustre/scratch[OST:28]
> scratch-OST001d_UUID 366288896 5411890 360877006 1%
> /lustre/scratch[OST:29] scratch-OST001e_UUID 216236615 6188887 210047728
> 2% /lustre/scratch[OST:30] scratch-OST001f_UUID 366288896 6465049
> 359823847 1% /lustre/scratch[OST:31]
>
> filesystem summary: 1453492963 174078773 1279414190 11% /lustre/scratch
>
>
> Thanks,
> Mike Robbert
>
> On Jan 11, 2010, at 7:24 PM, Andreas Dilger wrote:
> > On 2010-01-11, at 15:59, Michael Robbert wrote:
> >> The filename is not very unique. I can create a file with the same
> >> name in another directory or on another Lustre filesystem. It is
> >> just this exact path on this filesystem. The full path is:
> >> /lustre/scratch/smoqbel/Cenval/CLM/Met.Forcing/18X11/NLDAS.APCP.
> >> 007100.pfb.00164
> >> The mount point for this filesystem is /lustre/scratch/
> >
> > Robert,
> > does the same problem happen on multiple client nodes, or is it only
> > happening on a single client? Are there any messages on the MDS and/
> > or the OSSes when this problem is happening? This problem is somewhat
> > unusual, since I'm not aware of any places outside the disk filesystem
> > code that would cause ENOSPC when creating a file.
> >
> > Can you please do a bit of debugging on the system:
> >
> > {client}# cd /lustre/scratch/smoqbel/Cenval/CLM/Met.Forcing/18X11
> > {mds,client}# echo -1 > /proc/sys/lustre/debug # enable full debug
> > {mds,client}# lctl clear # clear debug logs
> > {client}# touch NLDAS.APCP.007100.pfb.00164
> > {mds,client}# lctl dk > /tmp/debug.{mds,client} # dump debug logs
> >
> > For now, please extract the ENOSPC error from the logs will be much
> > shorter, and may be enough to identify where the problem is located,
> > and will be a lot friendlier to the list.
> >
> > grep -- "-28" /tmp/debug.{mds,client} > /tmp/debug-28.{mds,client}::
> >
> > along with the "lfs df" and "lfs df -i" output.
> >
> > If this is only on a single client, just dropping the locks on the
> > client might be enough to resolve the problem:
> >
> > for L in /proc/fs/lustre/ldlm/namespaces/*; do
> > echo clear > $L
> > done
> >
> > If, on the other hand, this same problem is happening on all clients
> > then the problem is likely on the MDS.
> >
> >>> On Fri, Jan 8, 2010 at 1:36 PM, Michael Robbert
> >>>
> >>> <mrobbert at mines.edu> wrote:
> >>>> I have a user that reported a problem creating a file on our
> >>>> Lustre filesystem. When I investigated I found that the problem
> >>>> appears to be unique to just one filename in one directory. I have
> >>>> tried numerous ways of creating the file including echo, touch,
> >>>> and "lfs setstripe" all return "No space left on device". I have
> >>>> checked the filesystem with df and "lfs df" both show that the
> >>>> filesystem and all OSTs are far from being full for both blocks
> >>>> and inodes. Slight changes in the filename are created fine. We
> >>>> had a kernel panic on the MDS yesterday and it was quite possible
> >>>> that the user had a compute job working in this directory at the
> >>>> time of that problem. I am guessing we have some kind of
> >>>> corruption with the directory. This directory has around 1 million
> >>>> files so moving the data around may not be a quick operation, but
> >>>> we're willing to do it. I just want to know the best way, short of
> >>>> taking the filesystem offline, to fix this problem.
> >>>>
> >>>> Any ideas? Thanks in advance,
> >>>> Mike Robbert
> >>>> _______________________________________________
> >>>> Lustre-discuss mailing list
> >>>> Lustre-discuss at lists.lustre.org
> >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>
> >> _______________________________________________
> >> Lustre-discuss mailing list
> >> Lustre-discuss at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Sr. Staff Engineer, Lustre Group
> > Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
--
Bernd Schubert
DataDirect Networks
More information about the lustre-discuss
mailing list