[Lustre-discuss] Has anybody seen: ldiskfs_get_inode_block: bad inode number: 1

Andreas Dilger adilger at sun.com
Mon Nov 9 12:02:46 PST 2009


On 2009-11-09, at 06:37, Budsjö, Martin wrote:
> After our upgrade to 1.6.7.2 (from 1.4.12) we started to get  
> "ldiskfs_get_inode_block: bad inode number: 1" errors. This causes  
> the ldiskfs filesystem to remount read only making the whole lustre  
> filesystem read only.
>
> We took the filesytem offline and ran fsck on them. Some minor  
> errors was found and fixxed, but the error persists. Any clue?

That inode number is invalid to be used, so either there is a  
directory entry
with that value (seems unlikely to exist after an e2fsck run), or some  
client
is incorrectly requesting that inode number (possibly cached after  
having read
it from the MDS previously?).

It will probably go away if you reboot the client(s) that have cached  
this
inode number, but it isn't clear from the error messages below which  
client
that is.  You might be able to find that in the Lustre kernel debug  
logs.

> Example of the error:
>
> Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):  
> ldiskfs_get_inode_block: bad inode number: 1
> Nov  4 03:06:32 hlc305 kernel: Remounting filesystem read-only
> Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):  
> ldiskfs_get_inode_block: bad inode number: 1
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(fsfilt-ldiskfs.c: 
> 280:fsfilt_ldiskfs_start()) error starting handle for op 4 (35  
> credits): rc -30
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(fsfilt-ldiskfs.c: 
> 280:fsfilt_ldiskfs_start()) Skipped 52 previous similar messages
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_open.c: 
> 769:mds_finish_open()) mds_create_objects: rc = -30
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_open.c: 
> 769:mds_finish_open()) Skipped 1 previous similar message
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_reint.c: 
> 154:mds_finish_transno()) fsfilt_start: -30
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_reint.c: 
> 154:mds_finish_transno()) Skipped 52 previous similar messages
> Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):  
> ldiskfs_get_inode_block: bad inode number: 1
> Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):  
> ldiskfs_get_inode_block: bad inode number: 1
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1189:0:(mds_open.c: 
> 769:mds_finish_open()) mds_create_objects: rc = -30
> Nov  4 03:07:14 hlc305 kernel: Lustre: Failing over spehome-MDT0000
> Nov  4 03:07:14 hlc305 kernel: Lustre: Skipped 12 previous similar  
> messages
> Nov  4 03:07:14 hlc305 kernel: Lustre: *** setting obd spehome- 
> MDT0000 device 'dm-28' read-only ***
> Nov  4 03:07:14 hlc305 kernel: Turning device dm-28 (0xfd0001c) read- 
> only
> Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(handler.c: 
> 1601:mds_handle()) operation 101 on unconnected MDS from  
> 12345-10.0.0.86 at tcp
> Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(handler.c: 
> 1601:mds_handle()) Skipped 74 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(ldlm_lib.c: 
> 1643:target_send_reply_msg()) @@@ processing error (-107)   
> req at 00000100cabd0800 x30168375/t0 o101-><?>@<?>:
> 0/0 lens 440/0 e 0 to 0 dl 1257300535 ref 1 fl Interpret:/0/0 rc  
> -107/0
> Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(ldlm_lib.c: 
> 1643:target_send_reply_msg()) Skipped 74 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 137-5: UUID 'mds5_UUID'  
> is not available  for connect (stopping)
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_obd.c: 
> 380:llog_obd_origin_cleanup()) failure destroying log during  
> cleanup: -30
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_obd.c: 
> 380:llog_obd_origin_cleanup()) Skipped 6 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(fsfilt- 
> ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) can't start  
> transaction for 34 blocks (8192 bytes)
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(fsfilt- 
> ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) Skipped 6 previous  
> similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_lvfs.c: 
> 116:llog_lvfs_write_blob()) error writing log record: rc -30
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_lvfs.c: 
> 116:llog_lvfs_write_blob()) Skipped 6 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog.c: 
> 135:llog_cancel_rec()) Failure re-writing header -30
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog.c: 
> 135:llog_cancel_rec()) Skipped 6 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(handler.c: 
> 1963:mds_update_server_data()) error writing MDS server data: rc = -30
> Nov  4 03:07:15 hlc305 kernel: Lustre: spehome-MDT0000: shutting  
> down for failover; client state will be preserved.
> Nov  4 03:07:15 hlc305 kernel: Lustre: MDT spehome-MDT0000 has  
> stopped.
> Nov  4 03:07:15 hlc305 kernel: VFS: Busy inodes after unmount. Self- 
> destruct in 5 seconds.  Have a nice day...
> Nov  4 03:07:15 hlc305 kernel: Removing read-only on unknown block  
> (0xfd0001c)
> Nov  4 03:07:15 hlc305 kernel: Lustre: server umount spehome-MDT0000  
> complete
> Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with  
> errors, running e2fsck is recommended
> Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: recovery complete.
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with  
> ordered data mode.
> Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with  
> errors, running e2fsck is recommended
> Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with  
> ordered data mode.
> Nov  4 03:07:16 hlc305 kernel: Lustre: Enabling user_xattr
> Nov  4 03:07:16 hlc305 kernel: Lustre: 17137:0:(mds_fs.c: 
> 511:mds_init_server_data()) RECOVERY: service spehome-MDT0000, 166  
> recoverable clients, last_transno 7646800754
> Nov  4 03:07:16 hlc305 kernel: Lustre: 636:0:(mds_lov.c: 
> 1075:mds_notify()) MDS spehome-MDT0000: in recovery, not resetting  
> orphans on ost1spehome_UUID
> Nov  4 03:07:16 hlc305 kernel: Lustre: MDT spehome-MDT0000 now  
> serving mds5_UUID (spehome- 
> MDT0000/2af526ca-85d4-996b-4773-4898812d6a31), but will be in  
> recovery for at le
> ast 5:00, or until 166 clients reconnect. During this time new  
> clients will not be allowed to connect. Recovery progress can be  
> monitored by watching /proc/fs/lustre/mds/
> spehome-MDT0000/recovery_status.
> Nov  4 03:07:16 hlc305 kernel: Lustre: Server spehome-MDT0000 on  
> device /dev/vg_mds/spehome has started
> Nov  4 03:07:16 hlc305 kernel: Lustre: spehome-MDT0000: temporarily  
> refusing client connection from 10.0.0.69 at tcp
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1247:0:(ldlm_lib.c: 
> 1240:check_and_start_recovery_timer()) spehome-MDT0000: starting  
> recovery timer
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1249:0:(ldlm_lib.c: 
> 1591:target_queue_last_replay_reply()) spehome-MDT0000: 165  
> recoverable clients remain
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1249:0:(ldlm_lib.c: 
> 1591:target_queue_last_replay_reply()) Skipped 7 previous similar  
> messages
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1284:0:(mds_open.c: 
> 841:mds_open_by_fid()) Orphan d5839c:772a0a47 found and opened in  
> PENDING directory
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1284:0:(mds_open.c: 
> 841:mds_open_by_fid()) Skipped 1 previous similar message
>
>
> --------------------------------------------------------------------------
> Confidentiality Notice: This message is private and may contain  
> confidential and proprietary information. If you have received this  
> message in error, please notify us and remove it from your system  
> and note that you must not copy, distribute or take any action in  
> reliance on it. Any unauthorized use or disclosure of the contents  
> of this message is not permitted and may be unlawful.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the lustre-discuss mailing list