[Lustre-discuss] lustre-discuss-list - 10 new messages in 5 topics - digest

Fri Jul 17 11:01:36 PDT 2009

Hi all,

Regarding to the comparsion of Lustre with GlusterFS, i have the
fallowing together with the attached file.

I used this in a presentation at
http://www.beliefproject.org/events/4th-belief-international-symposium

It can be a start of comparsion. GlusterFS has a lower performance,
but we can make a RAID 10 though the network, in the case that the
glusterfs equivalent of OST are DAS( ie, internal disk, not a SAN
lun). Also,  apparently, there is no deadlock when the glusterfs node
is simultaneously configured to be  client and server.

In my experince, the integration between most recent kernels with
glusterfs and patches of Xen hypervisor works well. The same with
Lustre is harder to do.

Although, there is issues when trying to boot up a virtual machine
image stored in glusterfs mount point.

Also glusterfs has lower performance when overwriting files when
compared with the write process.

Apparently glusterfs does not stripe files between nodes and
apparently there is a single file limit size asspciated to the space
available in the gluster equivalent to OST.

I am waiting for version 2 of Lustre.

Best Regards.
--
Ettore Enrico Delfino Ligorio
ettoredelfinoligorio at gmail.com
55-11-9145-6151

On Fri, Jul 17, 2009 at 4:45 AM, lustre-discuss-list
group<noreply at googlegroups.com> wrote:
>
> lustre-discuss-list
> http://groups.google.com/group/lustre-discuss-list?hl=en
>
> lustre-discuss-list at googlegroups.com
>
> Today's topics:
>
> * MDT move aka backup w rsync - 3 messages, 2 authors
>  http://groups.google.com/group/lustre-discuss-list/t/cbab59a18743f8aa?hl=en
> * Lustre compared to Gluster - 1 messages, 1 author
>  http://groups.google.com/group/lustre-discuss-list/t/18b4c2c838289bed?hl=en
> * One Lustre Client lost One Lustre Disk - 2 messages, 2 authors
>  http://groups.google.com/group/lustre-discuss-list/t/4c313176573ce691?hl=en
> * One Lustre Client lost One Lustre Disk--solved - 1 messages, 1 author
>  http://groups.google.com/group/lustre-discuss-list/t/ac24b4ae3b9b8b53?hl=en
> * Hastening lustrefs recovery - 3 messages, 3 authors
>  http://groups.google.com/group/lustre-discuss-list/t/5ba606574f699a0d?hl=en
>
> ==============================================================================
> TOPIC: MDT move aka backup w rsync
> http://groups.google.com/group/lustre-discuss-list/t/cbab59a18743f8aa?hl=en
> ==============================================================================
>
> == 1 of 3 ==
> Date: Wed, Jul 15 2009 10:43 am
> From: Andreas Dilger
>
>
> On Jul 15, 2009  18:35 +0200, Thomas Roth wrote:
>> I want to move a MDT from one server to another. After studying some
>> mails concerning MDT backup, I've just tried (successfully, it seems) to
>> do that on a small test system  with rsync:
>>
>> - Stop Lustre, umount all servers.
>> - Format a suitable disk partition on the new hardware, using the same
>> mkfs-options as for the original MDT.
>> - Mount the original MDT:    mount   -t ldiskfs      /dev/sdb1    /mnt
>> - Mount the target partition: mount   -t ldiskfs   -O ext_attr
>> /dev/sdb1    /mnt
>> - Copy the data:  rsync   -Xav   oldserver:/mnt/    newserver:/mnt
>> - Umount partitions, restart MGS
>> - Mount new MDT
>>
>> This procedure was described by Jim Garlick on this list. You might note
>> that I used the mount option "-O ext_attr" only on the target machine:
>> my mistake perhaps, but no visible problems. In fact, I haven't found
>> this option mentioned in any man page or on the net. Nevertheless, my
>> mount command did not complain about it. So I wonder whether it is
>> necessary at all - I seem to have extracted the attributes from the old
>> MDT all right, without this mount option - ?
>
> If you have verified that the file data is actually present, this should
> work correctly.  In particular, the critical Lustre information is in the
> "trusted.lov" xattr, so you need to ensure that is present.  The MDS will
> "work" without this xattr, but it will assume all of the files have no
> data.
>
>> I'm investigating this because our production MDT seems to have a number
>> of problems. In particular the underlying file system is in bad shape,
>> fsck correcting a large number of ext3-errors, incorrect inodes and so
>> forth. We want to verify that it is not a hardware issue - bit-flipping
>> RAID controller, silent "memory corruption", whatever. We have a
>> DRBD-mirror of this MDT running, but of course DRBD just reproduces all
>> errors on the mirror.  Copying from one ldiskfs to another should avoid
>> that?
>>
>> The traditional backup method of getting the EAs and tar-ing the MDT
>> doesn't finish in finite time. It did before, and the filesystem has
>> since grown by a mere 40GB of data, so it shouldn't take that much
>> longer - certainly another indication that there is something wrong.
>> Of course I have yet to see whether "rsync -Xav" does much better on the
>> full system ;-)
>>
>> The system runs Debian Etch, kernel 2.6.22, Lustre 1.6.7.1
>
> Direct MDT backup has a problem in 1.6.7.1 due to the addition of the
> file size on the MDT inodes.  If you do "--sparse" backups this should
> avoid the slowdown.  You could also try the "dump" program, which can
> avoid reading the data from sparse files entirely.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
> == 2 of 3 ==
> Date: Thurs, Jul 16 2009 6:43 am
> From: Thomas Roth
>
>
>
> Andreas Dilger wrote:
>> On Jul 15, 2009  18:35 +0200, Thomas Roth wrote:
>>>...
>>> The traditional backup method of getting the EAs and tar-ing the MDT
>>> doesn't finish in finite time. It did before, and the filesystem has
>>> since grown by a mere 40GB of data, so it shouldn't take that much
>>> longer - certainly another indication that there is something wrong.
>>> Of course I have yet to see whether "rsync -Xav" does much better on the
>>> full system ;-)
>>>
>>> The system runs Debian Etch, kernel 2.6.22, Lustre 1.6.7.1
>>
>> Direct MDT backup has a problem in 1.6.7.1 due to the addition of the
>> file size on the MDT inodes.  If you do "--sparse" backups this should
>> avoid the slowdown.  You could also try the "dump" program, which can
>> avoid reading the data from sparse files entirely.
>
> Is this behavior changed in 1.6.7.2?  But the size information is still
> there in the inodes, isn't it?
> Anyhow our backup script already runs tar with the "--sparse" option.
>
> Regards, Thomas
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
> == 3 of 3 ==
> Date: Thurs, Jul 16 2009 6:51 am
> From: Thomas Roth
>
>
> For the record, I should add that I had forgotten one step which proves
> to be important, also mentioned before on this list:
>
> After copying with rsync, I had to
> cd   /srv/mdt;
> rm  CATALOGS  OBJECTS/*
> on the new MDT partition.
>
> Otherwise the OSTs are kicked out on remount with  "error looking up
> logfile ...: rc -2" and "OST0000_UUID sync failed -2, deactivating"
>
> Regards,
> Thomas
>
> Thomas Roth wrote:
>> Hi all,
>>
>> I want to move a MDT from one server to another. After studying some
>> mails concerning MDT backup, I've just tried (successfully, it seems) to
>> do that on a small test system  with rsync:
>>
>> - Stop Lustre, umount all servers.
>> - Format a suitable disk partition on the new hardware, using the same
>> mkfs-options as for the original MDT.
>> - Mount the original MDT:    mount   -t ldiskfs      /dev/sdb1    /mnt
>> - Mount the target partition: mount   -t ldiskfs   -O ext_attr
>> /dev/sdb1    /mnt
>> - Copy the data:  rsync   -Xav   oldserver:/mnt/    newserver:/mnt
>> - Umount partitions, restart MGS
>> - Mount new MDT
>>
>> This procedure was described by Jim Garlick on this list. You might note
>> that I used the mount option "-O ext_attr" only on the target machine:
>> my mistake perhaps, but no visible problems. In fact, I haven't found
>> this option mentioned in any man page or on the net. Nevertheless, my
>> mount command did not complain about it. So I wonder whether it is
>> necessary at all - I seem to have extracted the attributes from the old
>> MDT all right, without this mount option - ?
>>
>> My main question is whether this is a correct procedure for MDT backups,
>> or rather copies.
>>
>> I'm investigating this because our production MDT seems to have a number
>> of problems. In particular the underlying file system is in bad shape,
>> fsck correcting a large number of ext3-errors, incorrect inodes and so
>> forth. We want to verify that it is not a hardware issue - bit-flipping
>> RAID controller, silent "memory corruption", whatever. We have a
>> DRBD-mirror of this MDT running, but of course DRBD just reproduces all
>> errors on the mirror.  Copying from one ldiskfs to another should avoid
>> that?
>>
>> The traditional backup method of getting the EAs and tar-ing the MDT
>> doesn't finish in finite time. It did before, and the filesystem has
>> since grown by a mere 40GB of data, so it shouldn't take that much
>> longer - certainly another indication that there is something wrong.
>> Of course I have yet to see whether "rsync -Xav" does much better on the
>> full system ;-)
>>
>> Hm, not sure whether this all makes sense.
>>
>> The system runs Debian Etch, kernel 2.6.22, Lustre 1.6.7.1
>>
>> Regards,
>> Thomas
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> --
> --------------------------------------------------------------------
> Thomas Roth
> Department: Informationstechnologie
> Location: SB3 1.262
> Phone: +49-6159-71 1453  Fax: +49-6159-71 2986
>
> GSI Helmholtzzentrum für Schwerionenforschung GmbH
> Planckstraße 1
> D-64291 Darmstadt
> www.gsi.de
>
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Darmstadt
> Handelsregister: Amtsgericht Darmstadt, HRB 1528
>
> Geschäftsführer: Professor Dr. Horst Stöcker
>
> Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
>
> ==============================================================================
> TOPIC: Lustre compared to Gluster
> http://groups.google.com/group/lustre-discuss-list/t/18b4c2c838289bed?hl=en
> ==============================================================================
>
> == 1 of 1 ==
> Date: Thurs, Jul 16 2009 5:25 am
> From: Mag Gam
>
>
> We have been hearing a lot of news recently about "Gluster". Does
> anyone know how it compares to Lustre? Can it do the same things as
> Lustre? It seems it has built in SNS. Anyone know?
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
>
> ==============================================================================
> TOPIC: One Lustre Client lost One Lustre Disk
> http://groups.google.com/group/lustre-discuss-list/t/4c313176573ce691?hl=en
> ==============================================================================
>
> == 1 of 2 ==
> Date: Thurs, Jul 16 2009 7:40 am
> From: "Ms. Megan Larko"
>
>
> Good Day!
>
> Yesterday evening around 5:30 p.m. local time, one of my lustre client
> systems lost one of its two lustre disks.  I was not able to remount
> it, even after a reboot of the client.   The mount command returns the
> following message:
> [root at crew01 ~]# mount /crew2
> mount.lustre: mount ic-mds1 at o2ib:/crew2 at /crew2 failed: Invalid argument
> This may have multiple causes.
> Is 'crew2' the correct filesystem name?
> Are the mount options correct?
> Check the syslog for more info.
>
>
> And the /var/log/messages file (CentOS 5.1 system using  2.6.18-53.1.13.el5):
> Jul 16 10:30:53 crew01 kernel: LustreError: 156-2: The client profile
> 'crew2-client' could not be read from the MGS.  Does that filesystem
> exist?
> Jul 16 10:30:53 crew01 kernel: Lustre: client ffff810188fcfc00 umount complete
> Jul 16 10:30:53 crew01 kernel: LustreError:
> 26240:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount  (-22)
>
> The entry in the client /etc/fstab file is unchanged from before:
> ic-mds1 at o2ib:/crew2     /crew2          lustre  nouser_xattr,_netdev    0 0
>
> This same client uses the /etc/fstab entry
> "ic-mds1 at o2ib:/crew8    /crewdat        lustre  nouser_xattr,_netdev    0 0"
> This lustre disk is still mounted and usable:
> ic-mds1 at o2ib:/crew8    76T   30T   42T  42% /crewdat
>
> What is also interesting is that other clients still have access to
> the /crew2 disk, even though this one client does not.
>  There are no crew2 errors in the MGS/MDS system which serves both of
> the lustre disks.
>
> What has this one particular client lost that prevents it from
> mounting the /crew2 disk to which the other clients still have access?
>
> Any and all suggestions are appreciated.
> megan
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
> == 2 of 2 ==
> Date: Thurs, Jul 16 2009 10:46 pm
> From: Andreas Dilger
>
>
> On Jul 16, 2009  10:40 -0400, Ms. Megan Larko wrote:
>> [root at crew01 ~]# mount /crew2
>> mount.lustre: mount ic-mds1 at o2ib:/crew2 at /crew2 failed: Invalid argument
>> This may have multiple causes.
>> Is 'crew2' the correct filesystem name?
>> Are the mount options correct?
>> Check the syslog for more info.
>>
>>
>> And the /var/log/messages file (CentOS 5.1 system using  2.6.18-53.1.13.el5):
>> Jul 16 10:30:53 crew01 kernel: LustreError: 156-2: The client profile
>> 'crew2-client' could not be read from the MGS.  Does that filesystem
>> exist?
>> Jul 16 10:30:53 crew01 kernel: Lustre: client ffff810188fcfc00 umount complete
>> Jul 16 10:30:53 crew01 kernel: LustreError:
>> 26240:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount  (-22)
>>
>> The entry in the client /etc/fstab file is unchanged from before:
>> ic-mds1 at o2ib:/crew2   /crew2          lustre  nouser_xattr,_netdev    0 0
>>
>> This same client uses the /etc/fstab entry
>> ic-mds1 at o2ib:/crew8   /crewdat        lustre  nouser_xattr,_netdev    0 0
>> This lustre disk is still mounted and usable:
>> ic-mds1 at o2ib:/crew8    76T   30T   42T  42% /crewdat
>>
>> What is also interesting is that other clients still have access to
>> the /crew2 disk, even though this one client does not.
>>  There are no crew2 errors in the MGS/MDS system which serves both of
>> the lustre disks.
>>
>> What has this one particular client lost that prevents it from
>> mounting the /crew2 disk to which the other clients still have access?
>
> You should also check for messages on the MDS also.  You can check for
> the config file existing via "debugfs -c -R 'ls -l CONFIGS' /dev/{mdsdev}".
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
>
> ==============================================================================
> TOPIC: One Lustre Client lost One Lustre Disk--solved
> http://groups.google.com/group/lustre-discuss-list/t/ac24b4ae3b9b8b53?hl=en
> ==============================================================================
>
> == 1 of 1 ==
> Date: Thurs, Jul 16 2009 3:31 pm
> From: "Ms. Megan Larko"
>
>
> Hi,
>
> I fixed the problem of the one Lustre client not mounting one Lustre disk.
>
> Truthfully, the problem expanded slightly.  When I rebooted another
> client, it also lost contact with this one particular Lustre disk.
> The error messages were exactly the same:
>
> [root at crew01 ~]# mount /crew2
> mount.lustre: mount ic-mds1 at o2ib:/crew2 at /crew2 failed: Invalid argument
> This may have multiple causes.
> Is 'crew2' the correct filesystem name?
> Are the mount options correct?
> Check the syslog for more info.
>
> So, I thought something may have become a bit off with the disk
> set-up.   I had recently upgraded the other MDT disk to a larger
> physical volume.  This was successfully done following instructions in
> the Lustre Manual.   So I thought perhaps the MDT that I did not
> change merely needed to be "re-set".
>
> On the MGS, I unmounted the MDT of the problem disk and ran the
> following command:
>>> tunefs.lustre --writeconf --mgs --mdt  --fsname=crew2 /dev/{sd-whatever}
> I then remounted the MDT (which is also the MGS) successfully.
>
> On the OSS, I first unmounted the OST disks and then I issued the command:
>>> tunefs.lustre --writeconf --ost /dev/{sd-whatever}
> This was issued for each and every OST.   I mounted my OSTs again successfully.
>
> On my clients, I issued the mount command for the /crew2 lustre disk
> and it was now successful.  No more "invalid argument" message.
> One client did give me a "Transport endpoint not connected message",
> so that client will require a re-boot to remount this lustre disk
> (unless anyone can tell me how to do the re-mount without a reboot of
> this client).
>
> So--- I am guessing that when I did the upgrade in hardware disk size
> on the non-mgs lustre disk a few weeks ago, the other lustre disk,
> which functions as the mgs, was left in a state such that I could not
> pick-up that disk from the clients if I rebooted a client.  Is this an
> accurate guess?    If it is, then one may with to add to the Lustre
> Manual (Ch. 15 in 1.6.x versions on restoring metadata to an mdt disk)
> that the mgs disk may require an update using tunefs.lustre
> --writeconf even if it was not the disk which was restored.
>
> I may be wrong in my guess, but the above procedure did get my lustre
> disk back onto my clients successfully.
>
> Cheers!
> megan
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
>
> ==============================================================================
> TOPIC: Hastening lustrefs recovery
> http://groups.google.com/group/lustre-discuss-list/t/5ba606574f699a0d?hl=en
> ==============================================================================
>
> == 1 of 3 ==
> Date: Thurs, Jul 16 2009 3:59 pm
> From: Josephine Palencia
>
>
>
> OS: Centos5.2 x86_64
> Lustre: 1.6.5.1
> OpenIB: 1.3.1
>
>
> What determines the speed at which a lustre fs will recover? (ex. after a
> crash)  Can (should) one hasten the recovery by tweaking some parameters?
>
> For 4 OSTS each with 7TB, ~40 connected clients , recovery time
> is 48min. Is that reasonable or is that too long?
>
>
> Thanks,
> josephine
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
> == 2 of 3 ==
> Date: Thurs, Jul 16 2009 5:15 pm
> From: "Brian J. Murrell"
>
>
>
> On Thu, 2009-07-16 at 18:59 -0400, Josephine Palencia wrote:
>>
>> What determines the speed at which a lustre fs will recover? (ex. after a
>> crash)
>
> How fast all of the clients can reconnect and replay their pending
> transactions.
>
>> Can (should) one hasten the recovery by tweaking some parameters?
>
> There's not much to tweak.  Recovery waits for a) all clients to
> reconnect and replay or b) the recovery timer to run out.  The recovery
> timer is a factor of obd_timeout.  As you probably know obd_timeout has
> a value below which you will start to see timeouts and evictions --
> which you don't want of course.  So you don't really want to set it
> below that value.
>
> The first question people tend ask when they discover they need to tune
> their obd_timeout upwards to avoid lock callback timeouts and so forth
> is "why don't I just set that really high then?".  The answer is always
> "because the higher you set it, the longer your recovery process will
> take in the event that not all clients are available to replay.
>
> Of course the bigger your client count the higher the odds that the
> recovery timeout is your deciding factor and not all clients being
> available to connect.
>
> Of interest to all of this is that in 1.8, adaptive timeouts (AT) are
> enabled by default, so obd_timeout should generally always be high
> enough without being too high -- i.e. optimal.  So if your OSSes and MDS
> are tuned such that they are not overwhelming their disk backend,
> obd_timeout should be reasonable and therefore recovery should be
> reasonable.
>
>> For 4 OSTS each with 7TB, ~40 connected clients , recovery time
>> is 48min. Is that reasonable or is that too long?
>
> Wow.  That seems long.  That is recovery of what?  A single OST or
> single OSS, or something other?
>
> b.
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
> == 3 of 3 ==
> Date: Thurs, Jul 16 2009 5:26 pm
> From: Jeffrey Bennett
>
>
>> > For 4 OSTS each with 7TB, ~40 connected clients , recovery time is
>> > 48min. Is that reasonable or is that too long?
>>
>> Wow.  That seems long.  That is recovery of what?  A single
>> OST or single OSS, or something other?
>>
> Recovery times I have been seeing on similar systems are around 2-3 minutes. That's what it takes clients to replay their transactions or time out. This is in the event of failover to a new MDS.
>
> jab
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
>
>
> ==============================================================================
>
> You received this message because you are subscribed to the Google Groups "lustre-discuss-list"
> group.
>
> To post to this group, send email to lustre-discuss at lists.lustre.org or visit http://groups.google.com/group/lustre-discuss-list?hl=en
>
> To unsubscribe from this group, send email to lustre-discuss-list+unsubscribe at googlegroups.com
>
> To change the way you get mail from this group, visit:
> http://groups.google.com/group/lustre-discuss-list/subscribe?hl=en
>
> To report abuse, send email explaining the problem to abuse at googlegroups.com
>
> ==============================================================================
> Google Groups: http://groups.google.com/?hl=en
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fs_benck_CCE.xls
Type: application/vnd.ms-excel
Size: 17408 bytes
Desc: not available
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090717/20759328/attachment.xls>