[Lustre-discuss] 1.6.6 to 1.8.3 upgrade, OSS with wrong "Target" value

Roger Sersted rs1 at aps.anl.gov
Mon Jul 12 11:38:16 PDT 2010


Thanks for the quick response.  The logs on the problem server indicates the 
ldiskfs RPM was not installed for the first mount attempt.  Lustre rejected the 
attempt here:

un 26 17:43:58 puppy7 kernel: LustreError: 
3358:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs 
failed: -19, ldiskfs2 failed: -19.  Is the ldiskfs module available?
Jun 26 17:43:58 puppy7 kernel: LustreError: 
3358:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19
Jun 26 17:43:58 puppy7 kernel: LustreError: 
3358:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount  (-19)
Jun 26 17:44:10 puppy7 ntpd[3082]: synchronized to 172.16.2.254, stratum 3
Jun 26 17:44:19 puppy7 kernel: LustreError: 
3368:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs 
failed: -19, ldiskfs2 failed: -19.  Is the ldiskfs module available?
Jun 26 17:44:19 puppy7 kernel: LustreError: 
3368:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19
Jun 26 17:44:19 puppy7 kernel: LustreError: 
3368:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount  (-19)
Jun 26 17:53:39 puppy7 kernel: LustreError: 
3430:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs 
failed: -19, ldiskfs2 failed: -19.  Is the ldiskfs module available?
Jun 26 17:53:39 puppy7 kernel: LustreError: 
3430:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19
Jun 26 17:53:39 puppy7 kernel: LustreError: 
3430:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount  (-19)


I then installed the ldiskfs RPM on all the Lustre nodes (and fixed my kickstart 
config), modprobe'd lustre and attempted again:

Jun 26 17:54:30 puppy7 kernel: init dynlocks cache
Jun 26 17:54:30 puppy7 kernel: ldiskfs created from ext4-2.6-rhel5
Jun 26 17:54:30 puppy7 kernel: LDISKFS-fs: barriers enabled
Jun 26 17:54:33 puppy7 kernel: kjournald2 starting: pid 3457, dev sdc:8, commit 
interval 5 seconds
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs warning: checktime reached, running 
e2fsck is recommended
Jun 26 17:54:33 puppy7 kernel: LDISKFS FS on sdc, internal journal on sdc:8
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: delayed allocation enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: file extents enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: recovery complete.
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mounted filesystem sdc with ordered 
data mode
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal 
hits, 0 2^N hits, 0 breaks, 0 lost
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: barriers enabled
Jun 26 17:54:33 puppy7 kernel: kjournald2 starting: pid 3460, dev sdc:8, commit 
interval 5 seconds
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs warning: checktime reached, running 
e2fsck is recommended
Jun 26 17:54:33 puppy7 kernel: LDISKFS FS on sdc, internal journal on sdc:8
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: delayed allocation enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: file extents enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mounted filesystem sdc with ordered 
data mode
Jun 26 17:54:38 puppy7 kernel: Lustre: 
2725:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690561 
sent from MGC172.17.2.5 at o2ib to NID 172.17.2.5 at o2ib 5s ago has timed out (5s 
prior to deadline).
Jun 26 17:54:38 puppy7 kernel:   req at ffff810067706400 x1339651978690561/t0 
o250->MGS at MGC172.17.2.5@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1277592878 ref 1 
fl Rpc:N/0/0 rc 0/0
Jun 26 17:54:38 puppy7 kernel: LustreError: 
3445:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
req at ffff81013d553c00 x1339651978690563/t0 o101->MGS at MGC172.17.2.5@o2ib_0:26/25 
lens 296/544 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Jun 26 17:54:38 puppy7 kernel: Lustre: Filtering OBD driver; http://www.lustre.org/
Jun 26 17:54:38 puppy7 kernel: Lustre: lustre1-OST0001: Now serving 
lustre1-OST0001 on /dev/sdc with recovery enabled
Jun 26 17:55:03 puppy7 kernel: Lustre: 
2725:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690564 
sent from MGC172.17.2.5 at o2ib to NID 172.17.2.5 at o2ib 5s ago has timed out (5s 
prior to deadline).

-----------  a few timeout messages later ....

Jun 26 17:55:43 puppy7 kernel: LustreError: 
3649:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID 
req at ffff810060569800 x1339651978690572/t0 o101->MGS at MGC172.17.2.5@o2ib_0:26/25 
lens 296/544 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Jun 26 17:55:52 puppy7 kernel: Lustre: MGC172.17.2.5 at o2ib: Reactivating import
Jun 26 17:55:52 puppy7 kernel: Lustre: lustre1-OST0001: received MDS connection 
from 172.17.2.5 at o2ib
Jun 26 17:59:11 puppy7 ntpd[3082]: kernel time sync enabled 0001
Jun 26 18:03:51 puppy7 kernel: Lustre: 
2724:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690598 
sent from MGC172.17.2.5 at o2ib to NID 172.17.2.5 at o2ib 17s ago has timed out (17s 
prior to deadline).
Jun 26 18:03:51 puppy7 kernel:   req at ffff810060bcd800 x1339651978690598/t0 
o400->MGS at MGC172.17.2.5@o2ib_0:26/25 lens 192/384 e 0 to 1 dl 1277593431 ref 1 
fl Rpc:N/0/0 rc 0/0
Jun 26 18:03:51 puppy7 kernel: Lustre: 
2724:0:(client.c:1463:ptlrpc_expire_one_request()) Skipped 2 previous similar 
messages
Jun 26 18:03:51 puppy7 kernel: LustreError: 166-1: MGC172.17.2.5 at o2ib: 
Connection to service MGS via nid 172.17.2.5 at o2ib was lost; in progress 
operations using this service will fail.

--------------------------------------------

According to the above it looked like everything worked.  But, after waiting a 
while, I still couldn't mount lustre on a client.  I found a similar problem on 
the list,in that case, the fix was to mount the device as type ldiskfs and 
remove CONFIGS/<targetname>.  I hope that didn't permanently corrupt lustre?

Thanks,

Roger S.

Wojciech Turek wrote:
> Hi,
> 
> Could you please post system logs that were generated during first mount 
> after the upgrade?
> Did you run writeconf on MDT and all OSTs?
> 
> 
> 
> 
>  
> 
> On 12 July 2010 16:51, Roger Sersted <rs1 at aps.anl.gov 
> <mailto:rs1 at aps.anl.gov>> wrote:
> 
> 
> 
>     This is a small development system with a combined MDS/MGS on a
>     single node with
>     a SCSI interface to a disk array.  There are two OSSes, each with a
>     single OST
>     of 1.4TB comprised of a SATA array.  In all cases, the entire device
>     (/dev/sdc)
>     is used with no partitioning.
> 
>     I upgraded my Lustre MDS and OSS servers from 1.6.6 to 1.8.3.  I did
>     this via a
>     complete OS install and then performing a writeconf on each of the
>     nodes.
> 
>     Unfortunately, each of the OSSes thinks it's Lustre "Target" is
>     "lustre1-OST0000".  I've mounted the partitions via ldiskfs and the
>     underlying
>     data is still there.  I know which OSS is supposed to be
>     "lustre1-OST0001", but
>     I can't find any docs that explain how to set that.
> 
>     Thanks,
> 
>     Roger S.
> 
>     _______________________________________________
>     Lustre-discuss mailing list
>     Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org>
>     http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> 
> 
> -- 
> --
> Wojciech Turek
> 
> Assistant System Manager
> 
> High Performance Computing Service
> University of Cambridge
> Email: wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>
> Tel: (+)44 1223 763517



More information about the lustre-discuss mailing list