[Lustre-discuss] 1.6.6 to 1.8.3 upgrade, OSS with wrong "Target" value
Roger Sersted
rs1 at aps.anl.gov
Mon Jul 12 11:38:16 PDT 2010
Thanks for the quick response. The logs on the problem server indicates the
ldiskfs RPM was not installed for the first mount attempt. Lustre rejected the
attempt here:
un 26 17:43:58 puppy7 kernel: LustreError:
3358:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs
failed: -19, ldiskfs2 failed: -19. Is the ldiskfs module available?
Jun 26 17:43:58 puppy7 kernel: LustreError:
3358:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19
Jun 26 17:43:58 puppy7 kernel: LustreError:
3358:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-19)
Jun 26 17:44:10 puppy7 ntpd[3082]: synchronized to 172.16.2.254, stratum 3
Jun 26 17:44:19 puppy7 kernel: LustreError:
3368:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs
failed: -19, ldiskfs2 failed: -19. Is the ldiskfs module available?
Jun 26 17:44:19 puppy7 kernel: LustreError:
3368:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19
Jun 26 17:44:19 puppy7 kernel: LustreError:
3368:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-19)
Jun 26 17:53:39 puppy7 kernel: LustreError:
3430:0:(obd_mount.c:1290:server_kernel_mount()) premount /dev/sdc:0x0 ldiskfs
failed: -19, ldiskfs2 failed: -19. Is the ldiskfs module available?
Jun 26 17:53:39 puppy7 kernel: LustreError:
3430:0:(obd_mount.c:1616:server_fill_super()) Unable to mount device /dev/sdc: -19
Jun 26 17:53:39 puppy7 kernel: LustreError:
3430:0:(obd_mount.c:2045:lustre_fill_super()) Unable to mount (-19)
I then installed the ldiskfs RPM on all the Lustre nodes (and fixed my kickstart
config), modprobe'd lustre and attempted again:
Jun 26 17:54:30 puppy7 kernel: init dynlocks cache
Jun 26 17:54:30 puppy7 kernel: ldiskfs created from ext4-2.6-rhel5
Jun 26 17:54:30 puppy7 kernel: LDISKFS-fs: barriers enabled
Jun 26 17:54:33 puppy7 kernel: kjournald2 starting: pid 3457, dev sdc:8, commit
interval 5 seconds
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs warning: checktime reached, running
e2fsck is recommended
Jun 26 17:54:33 puppy7 kernel: LDISKFS FS on sdc, internal journal on sdc:8
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: delayed allocation enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: file extents enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: recovery complete.
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mounted filesystem sdc with ordered
data mode
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success)
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal
hits, 0 2^N hits, 0 breaks, 0 lost
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: barriers enabled
Jun 26 17:54:33 puppy7 kernel: kjournald2 starting: pid 3460, dev sdc:8, commit
interval 5 seconds
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs warning: checktime reached, running
e2fsck is recommended
Jun 26 17:54:33 puppy7 kernel: LDISKFS FS on sdc, internal journal on sdc:8
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: delayed allocation enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: file extents enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mballoc enabled
Jun 26 17:54:33 puppy7 kernel: LDISKFS-fs: mounted filesystem sdc with ordered
data mode
Jun 26 17:54:38 puppy7 kernel: Lustre:
2725:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690561
sent from MGC172.17.2.5 at o2ib to NID 172.17.2.5 at o2ib 5s ago has timed out (5s
prior to deadline).
Jun 26 17:54:38 puppy7 kernel: req at ffff810067706400 x1339651978690561/t0
o250->MGS at MGC172.17.2.5@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1277592878 ref 1
fl Rpc:N/0/0 rc 0/0
Jun 26 17:54:38 puppy7 kernel: LustreError:
3445:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req at ffff81013d553c00 x1339651978690563/t0 o101->MGS at MGC172.17.2.5@o2ib_0:26/25
lens 296/544 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Jun 26 17:54:38 puppy7 kernel: Lustre: Filtering OBD driver; http://www.lustre.org/
Jun 26 17:54:38 puppy7 kernel: Lustre: lustre1-OST0001: Now serving
lustre1-OST0001 on /dev/sdc with recovery enabled
Jun 26 17:55:03 puppy7 kernel: Lustre:
2725:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690564
sent from MGC172.17.2.5 at o2ib to NID 172.17.2.5 at o2ib 5s ago has timed out (5s
prior to deadline).
----------- a few timeout messages later ....
Jun 26 17:55:43 puppy7 kernel: LustreError:
3649:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req at ffff810060569800 x1339651978690572/t0 o101->MGS at MGC172.17.2.5@o2ib_0:26/25
lens 296/544 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Jun 26 17:55:52 puppy7 kernel: Lustre: MGC172.17.2.5 at o2ib: Reactivating import
Jun 26 17:55:52 puppy7 kernel: Lustre: lustre1-OST0001: received MDS connection
from 172.17.2.5 at o2ib
Jun 26 17:59:11 puppy7 ntpd[3082]: kernel time sync enabled 0001
Jun 26 18:03:51 puppy7 kernel: Lustre:
2724:0:(client.c:1463:ptlrpc_expire_one_request()) @@@ Request x1339651978690598
sent from MGC172.17.2.5 at o2ib to NID 172.17.2.5 at o2ib 17s ago has timed out (17s
prior to deadline).
Jun 26 18:03:51 puppy7 kernel: req at ffff810060bcd800 x1339651978690598/t0
o400->MGS at MGC172.17.2.5@o2ib_0:26/25 lens 192/384 e 0 to 1 dl 1277593431 ref 1
fl Rpc:N/0/0 rc 0/0
Jun 26 18:03:51 puppy7 kernel: Lustre:
2724:0:(client.c:1463:ptlrpc_expire_one_request()) Skipped 2 previous similar
messages
Jun 26 18:03:51 puppy7 kernel: LustreError: 166-1: MGC172.17.2.5 at o2ib:
Connection to service MGS via nid 172.17.2.5 at o2ib was lost; in progress
operations using this service will fail.
--------------------------------------------
According to the above it looked like everything worked. But, after waiting a
while, I still couldn't mount lustre on a client. I found a similar problem on
the list,in that case, the fix was to mount the device as type ldiskfs and
remove CONFIGS/<targetname>. I hope that didn't permanently corrupt lustre?
Thanks,
Roger S.
Wojciech Turek wrote:
> Hi,
>
> Could you please post system logs that were generated during first mount
> after the upgrade?
> Did you run writeconf on MDT and all OSTs?
>
>
>
>
>
>
> On 12 July 2010 16:51, Roger Sersted <rs1 at aps.anl.gov
> <mailto:rs1 at aps.anl.gov>> wrote:
>
>
>
> This is a small development system with a combined MDS/MGS on a
> single node with
> a SCSI interface to a disk array. There are two OSSes, each with a
> single OST
> of 1.4TB comprised of a SATA array. In all cases, the entire device
> (/dev/sdc)
> is used with no partitioning.
>
> I upgraded my Lustre MDS and OSS servers from 1.6.6 to 1.8.3. I did
> this via a
> complete OS install and then performing a writeconf on each of the
> nodes.
>
> Unfortunately, each of the OSSes thinks it's Lustre "Target" is
> "lustre1-OST0000". I've mounted the partitions via ldiskfs and the
> underlying
> data is still there. I know which OSS is supposed to be
> "lustre1-OST0001", but
> I can't find any docs that explain how to set that.
>
> Thanks,
>
> Roger S.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org>
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
> --
> --
> Wojciech Turek
>
> Assistant System Manager
>
> High Performance Computing Service
> University of Cambridge
> Email: wjt27 at cam.ac.uk <mailto:wjt27 at cam.ac.uk>
> Tel: (+)44 1223 763517
More information about the lustre-discuss
mailing list