[Lustre-discuss] Unexpect file system error during normal system works

Wojciech Turek wjt27 at cam.ac.uk
Thu Jun 16 08:52:26 PDT 2011


Hi Piotr,

Which lustre version is this? Also which version of e2fsprogs are you using?
Is the back end disk a software RAID or HW raid? If you can not see any
errors on your hardware I would recommend to run fsck few times until it
does does not find any problems. I also highly recommend to collect logs
from each fsck run in case they are needed for further debugging. If you are
not sure that your hardware is OK then you may want to run fsck with -n
switch and send output to mailing list.

Best regards,

Wojciech

On 16 June 2011 13:33, Piotr Przybylo <piotr_przybylo at polcom.com.pl> wrote:

>  We have a problem with lustre, in connection with this I wanted to ask
> you, can you help us ?
> We have a unexpect file system error during normal system working.
> *
> Jun 13 15:00:30 ossw12 kernel: LDISKFS-fs error (device dm-9):
> mb_free_blocks: double-free of inode 82041293's block 346591170(bit 4034
> in group 10577)
> Jun 13 15:00:30 ossw12 kernel:
> Jun 13 15:00:30 ossw12 kernel: Aborting journal on device dm-9.
> Jun 13 15:00:30 ossw12 kernel: Remounting filesystem read-only
> Jun 13 15:00:30 ossw12 kernel: LDISKFS-fs error (device dm-9):
> mb_free_blocks: <3>LustreError:
> 4026:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) error starting
> handle for op 8 (106 credits): rc -30
> Jun 13 15:00:30 ossw12 kernel: double-free of inode 82041293's block
> 346591171(bit 4035 in group 10577)*
>
>
> Jun 13 15:06:53 ossw12 kernel: LDISKFS-fs error (device dm-12):
> mb_free_blocks: double-free of inode 90143054's block 125314561(bit 9729
> in group 3824)
> Jun 13 15:06:53 ossw12 kernel:
> Jun 13 15:06:53 ossw12 kernel: Aborting journal on device dm-12.
> Jun 13 15:06:53 ossw12 kernel: Remounting filesystem read-only
> Jun 13 15:06:53 ossw12 kernel: ldiskfs_abort called.
> Jun 13 15:06:53 ossw12 kernel: LDISKFS-fs error (device dm-12):
> ldiskfs_journal_start_sb: Detected aborted journal
> Jun 13 15:06:53 ossw12 kernel: Remounting filesystem read-only
>
>
> Another try to mount file system:
> *
> Jun 13 15:12:24 ossw12 kernel: kjournald starting.  Commit interval 5
> seconds
> Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning (device dm-9):
> ldiskfs_clear_journal_err: Filesystem error recorded from previous
> mount: IO failure
> Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning (device dm-9):
> ldiskfs_clear_journal_err: Marking fs in need of filesystem check.
> Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning: mounting fs with
> errors, running e2fsck is recommended
> Jun 13 15:12:24 ossw12 kernel: LDISKFS FS on dm-9, internal journal
> Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs: recovery complete.
> Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs: mounted filesystem with
> ordered data mode.*
>
> *
> Jun 13 15:16:48 ossw12 kernel: kjournald starting.  Commit interval 5
> seconds
> Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning (device dm-12):
> ldiskfs_clear_journal_err: Filesystem error recorded from previous
> mount: IO failure
> Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning (device dm-12):
> ldiskfs_clear_journal_err: Marking fs in need of filesystem check.
> Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning: mounting fs with
> errors, running e2fsck is recommended
> Jun 13 15:16:48 ossw12 kernel: LDISKFS FS on dm-12, internal journal
> Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs: recovery complete.
> Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs: mounted filesystem with
> ordered data mode.*
>
> How we can recover or repair data from this devices ?
> fsck repair some errors, but then we try mount files system we have errors:
> *
> Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9):
> mb_free_blocks: double-free of inode 82041293's block 346591170(bit 4034 in
> group 10577)
> Jun 13 18:39:17 ossw12 kernel:
> Jun 13 18:39:17 ossw12 kernel: Aborting journal on device dm-9.
> Jun 13 18:39:17 ossw12 kernel: Remounting filesystem read-only
> Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9):
> mb_free_blocks: double-free of inode 82041293's block 346591171(bit 4035 in
> group 10577)
> Jun 13 18:39:17 ossw12 kernel:
> Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9):
> mb_free_blocks: double-free of inode 82041293's block 346591172(bit 4036 in
> group 10577)*
>
> Hardware doesnt report any problems.
>
> --
>
> Regards
>
> | Piotr Przybylo | Technical Support Engineer | Polcom Sp z o.o. |
> | ul. Krakowska 43 | 32-050 Skawina, Poland |
> | mobile: +48609539945 | tel: +48 12 652 8682 |
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>


-- 
Wojciech Turek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20110616/373be97b/attachment.htm>


More information about the lustre-discuss mailing list