[Lustre-discuss] Has anybody seen: ldiskfs_get_inode_block: bad inode number: 1

"Budsjö, Martin" Martin.Budsjo at astrazeneca.com
Mon Nov 9 05:37:53 PST 2009


Hi

After our upgrade to 1.6.7.2 (from 1.4.12) we started to get "ldiskfs_get_inode_block: bad inode number: 1" errors. This causes the ldiskfs filesystem to remount read only making the whole lustre filesystem read only.  

We took the filesytem offline and ran fsck on them. Some minor errors was found and fixxed, but the error persists. Any clue?


Regards
Martin Budsjö

Example of the error:

Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28): ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: Remounting filesystem read-only
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28): ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) error starting handle for op 4 (35 credits): rc -30
Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) Skipped 52 previous similar messages
Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_open.c:769:mds_finish_open()) mds_create_objects: rc = -30
Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_open.c:769:mds_finish_open()) Skipped 1 previous similar message
Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_reint.c:154:mds_finish_transno()) fsfilt_start: -30
Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_reint.c:154:mds_finish_transno()) Skipped 52 previous similar messages
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28): ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28): ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LustreError: 1189:0:(mds_open.c:769:mds_finish_open()) mds_create_objects: rc = -30
Nov  4 03:07:14 hlc305 kernel: Lustre: Failing over spehome-MDT0000
Nov  4 03:07:14 hlc305 kernel: Lustre: Skipped 12 previous similar messages
Nov  4 03:07:14 hlc305 kernel: Lustre: *** setting obd spehome-MDT0000 device 'dm-28' read-only ***
Nov  4 03:07:14 hlc305 kernel: Turning device dm-28 (0xfd0001c) read-only
Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(handler.c:1601:mds_handle()) operation 101 on unconnected MDS from 12345-10.0.0.86 at tcp
Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(handler.c:1601:mds_handle()) Skipped 74 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-107)  req at 00000100cabd0800 x30168375/t0 o101-><?>@<?>:
0/0 lens 440/0 e 0 to 0 dl 1257300535 ref 1 fl Interpret:/0/0 rc -107/0
Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 74 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 137-5: UUID 'mds5_UUID' is not available  for connect (stopping)
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_obd.c:380:llog_obd_origin_cleanup()) failure destroying log during cleanup: -30
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_obd.c:380:llog_obd_origin_cleanup()) Skipped 6 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(fsfilt-ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) can't start transaction for 34 blocks (8192 bytes)
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(fsfilt-ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) Skipped 6 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log record: rc -30
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) Skipped 6 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog.c:135:llog_cancel_rec()) Failure re-writing header -30
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog.c:135:llog_cancel_rec()) Skipped 6 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(handler.c:1963:mds_update_server_data()) error writing MDS server data: rc = -30
Nov  4 03:07:15 hlc305 kernel: Lustre: spehome-MDT0000: shutting down for failover; client state will be preserved.
Nov  4 03:07:15 hlc305 kernel: Lustre: MDT spehome-MDT0000 has stopped.
Nov  4 03:07:15 hlc305 kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
Nov  4 03:07:15 hlc305 kernel: Removing read-only on unknown block (0xfd0001c)
Nov  4 03:07:15 hlc305 kernel: Lustre: server umount spehome-MDT0000 complete
Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval 5 seconds
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with errors, running e2fsck is recommended
Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: recovery complete.
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval 5 seconds
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with errors, running e2fsck is recommended
Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with ordered data mode.
Nov  4 03:07:16 hlc305 kernel: Lustre: Enabling user_xattr
Nov  4 03:07:16 hlc305 kernel: Lustre: 17137:0:(mds_fs.c:511:mds_init_server_data()) RECOVERY: service spehome-MDT0000, 166 recoverable clients, last_transno 7646800754
Nov  4 03:07:16 hlc305 kernel: Lustre: 636:0:(mds_lov.c:1075:mds_notify()) MDS spehome-MDT0000: in recovery, not resetting orphans on ost1spehome_UUID
Nov  4 03:07:16 hlc305 kernel: Lustre: MDT spehome-MDT0000 now serving mds5_UUID (spehome-MDT0000/2af526ca-85d4-996b-4773-4898812d6a31), but will be in recovery for at le
ast 5:00, or until 166 clients reconnect. During this time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/mds/
spehome-MDT0000/recovery_status.
Nov  4 03:07:16 hlc305 kernel: Lustre: Server spehome-MDT0000 on device /dev/vg_mds/spehome has started
Nov  4 03:07:16 hlc305 kernel: Lustre: spehome-MDT0000: temporarily refusing client connection from 10.0.0.69 at tcp
Nov  4 03:07:16 hlc305 kernel: Lustre: 1247:0:(ldlm_lib.c:1240:check_and_start_recovery_timer()) spehome-MDT0000: starting recovery timer
Nov  4 03:07:16 hlc305 kernel: Lustre: 1249:0:(ldlm_lib.c:1591:target_queue_last_replay_reply()) spehome-MDT0000: 165 recoverable clients remain
Nov  4 03:07:16 hlc305 kernel: Lustre: 1249:0:(ldlm_lib.c:1591:target_queue_last_replay_reply()) Skipped 7 previous similar messages
Nov  4 03:07:16 hlc305 kernel: Lustre: 1284:0:(mds_open.c:841:mds_open_by_fid()) Orphan d5839c:772a0a47 found and opened in PENDING directory
Nov  4 03:07:16 hlc305 kernel: Lustre: 1284:0:(mds_open.c:841:mds_open_by_fid()) Skipped 1 previous similar message


--------------------------------------------------------------------------
Confidentiality Notice: This message is private and may contain confidential and proprietary information. If you have received this message in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this message is not permitted and may be unlawful.
 



More information about the lustre-discuss mailing list