[Lustre-discuss] lustre quota problems

McHale, Therese therese.mchale at hp.com
Wed Jan 2 05:39:06 PST 2008


The fix Roland mentions is included in Lustre 1.4.10 or you can also find it here https://bugzilla.lustre.org/attachment.cgi?id=8709

-therese

(HP SFS Support)

Postal Address: Hewlett Packard Galway Ltd., Ballybrit Business Park, Galway, Ireland
Registered Office: 63-74 Sir John Rogerson's Quay, Dublin 2, Ireland.
Registered Number: 361933

-----Original Message-----
From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of lustre-discuss-request at clusterfs.com
Sent: 02 January 2008 13:23
To: lustre-discuss at clusterfs.com
Subject: Lustre-discuss Digest, Vol 24, Issue 2


Send Lustre-discuss mailing list submissions to
        lustre-discuss at clusterfs.com

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
or, via email, send a message with subject or body 'help' to
        lustre-discuss-request at clusterfs.com

You can reach the person managing the list at
        lustre-discuss-owner at clusterfs.com

When replying, please edit your Subject line so it is more specific than "Re: Contents of Lustre-discuss digest..."


Today's Topics:

   1. lustre quota problems (Patrick Winnertz)
   2. Re: lustre quota problems (Roland Laifer)
   3. Re: help needed. (Aaron Knister)


----------------------------------------------------------------------

Message: 1
Date: Wed, 2 Jan 2008 11:27:56 +0100
From: Patrick Winnertz <patrick.winnertz at credativ.de>
Subject: [Lustre-discuss] lustre quota problems
To: Lustre-discuss <lustre-discuss at clusterfs.com>
Message-ID: <200801021127.58965.patrick.winnertz at credativ.de>
Content-Type: text/plain;  charset="iso-8859-1"

Hello,

I've several problems with quota on our testcluster:

When I set the quota for a person to a given value (e.g. the values which are provided in the operations manual), I'm able to write exact the amount which is set with setquota. But when I delete the files(file) I'm not able to use this space again.

Here is what I've done in detail:
lfs checkquota -ug /mnt/testfs
lfs setquota -u winnie 307200 309200 10000 11000 /mnt/testfs

Now I wrote one single big file with dd.
dd if=/dev/zero of=/mnt/testfs/test

As expected it stops writing the file after it is ~300 MB large. Removing this file and restarting dd leads to a zero-sized file, because the disk quota is exceeded.

Does anybody know this behaviour and know what is wrong here? (I guess some values are cached).

Thanks in advance!
Patrick Winnertz

--
Patrick Winnertz
Tel.: +49 (0) 2161 / 4643 - 0

credativ GmbH, HRB M?nchengladbach 12080
Hohenzollernstr. 133, 41061 M?nchengladbach
Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz



------------------------------

Message: 2
Date: Wed, 2 Jan 2008 11:51:28 +0100
From: Roland Laifer <Laifer at RZ.Uni-Karlsruhe.DE>
Subject: Re: [Lustre-discuss] lustre quota problems
To: Patrick Winnertz <patrick.winnertz at credativ.de>
Cc: Lustre-discuss <lustre-discuss at clusterfs.com>
Message-ID: <20080102105128.GC12028 at rz.uni-karlsruhe.de>
Content-Type: text/plain; charset=iso-8859-1

Hello,

we had the same problem with our Lustre software from HP (HP SFS). HP opened CFS bug 12431 (which is not visible to the public and for us). Therefore, I'm not sure which Lustre version includes the corresponding fix. HP provided a fix on top of their newest SFS version which fixed the problem.

Here is a part of the explanation for the problem:
Files which did not decrease the quota when they were deleted had
inode->i_dquota set to NULL which should not happen. The root cause
was in filter_destroy() and filter_commitrw_commit().

Regards,
  Roland
--
 --------------------------------------------------------------------------
  Roland Laifer
  Rechenzentrum, Universitaet Karlsruhe (TH), D-76128 Karlsruhe, Germany
  Email: Roland.Laifer at rz.uni-karlsruhe.de, Phone: +49 721 608 4861,
  Fax: +49 721 32550, Web: www.rz.uni-karlsruhe.de/personen/roland.laifer
 --------------------------------------------------------------------------

On Wed, Jan 02, 2008 at 11:27:56AM +0100, Patrick Winnertz wrote:
> Hello,
>
> I've several problems with quota on our testcluster:
>
> When I set the quota for a person to a given value (e.g. the values
> which are provided in the operations manual), I'm able to write exact
> the amount which is set with setquota. But when I delete the
> files(file) I'm not able to use this space again.
>
> Here is what I've done in detail:
> lfs checkquota -ug /mnt/testfs
> lfs setquota -u winnie 307200 309200 10000 11000 /mnt/testfs
>
> Now I wrote one single big file with dd.
> dd if=/dev/zero of=/mnt/testfs/test
>
> As expected it stops writing the file after it is ~300 MB large.
> Removing this file and restarting dd leads to a zero-sized file,
> because the disk quota is exceeded.
>
> Does anybody know this behaviour and know what is wrong here? (I guess
> some values are cached).
>
> Thanks in advance!
> Patrick Winnertz
>
> --
> Patrick Winnertz
> Tel.: +49 (0) 2161 / 4643 - 0
>
> credativ GmbH, HRB M?nchengladbach 12080
> Hohenzollernstr. 133, 41061 M?nchengladbach
> Gesch?ftsf?hrung: Dr. Michael Meskes, J?rg Folz
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss



------------------------------

Message: 3
Date: Wed, 2 Jan 2008 08:22:38 -0500
From: Aaron Knister <aaron at iges.org>
Subject: Re: [Lustre-discuss] help needed.
To: Avi Gershon <gershonavi at gmail.com>
Cc: Yan Benhammou <Yan.Benhammou at cern.ch>,
        lustre-discuss at clusterfs.com,   Meny Ben moshe <meny at lep1.tau.ac.il>
Message-ID: <E9B98183-FC9C-43CB-9ACF-AD7FC1CBE42A at iges.org>
Content-Type: text/plain; charset="us-ascii"

On the host x-math20 could you run an "lctl list_nids" and also an "ifconfig -a". I want to see if lnet is listening on the correct interface. Oh could you also post the contents of your /etc/ modprobe.conf.

Thanks!

-Aaron

On Jan 2, 2008, at 4:42 AM, Avi Gershon wrote:

> Hello to every one and happy new year..
> I think I have reduce my problem to this: lctl ping
> 132.66.176.211 at tcp0 don't work for me for some strange reason as you
> can see:
> **********************************************************************
> *************
> [root at x-math20 ~]# lctl ping 132.66.176.211 at tcp0
> failed to ping 132.66.176.211 at tcp: Input/output error
> [root at x-math20 ~]# ping 132.66.176.211
> PING 132.66.176.211 (132.66.176.211) 56(84) bytes of data.
> 64 bytes from 132.66.176.211: icmp_seq=0 ttl=64 time=0.152 ms
> 64 bytes from 132.66.176.211: icmp_seq=1 ttl=64 time=0.130 ms
> 64 bytes from 132.66.176.211: icmp_seq=2 ttl=64 time=0.131 m
> --- 132.66.176.211 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2018ms
> rtt min/avg/max/mdev = 0.130/0.137/0.152/0.016 ms, pipe 2
> [root at x-math20 ~]#
> *****************************************************************************************
>
>
> On 12/24/07, Avi Gershon <gershonavi at gmail.com> wrote:
> Hi,
> here is the "iptables -L  " results:
>
>  NODE 1 132.66.176.212
> Scientific Linux CERN SLC release 4.6 (Beryllium)
> root at 132.66.176.212's password: Last login: Sun Dec 23 22:01:18 2007
> from x-fishelov.tau.ac.il [root at localhost ~]#
> [root at localhost ~]#
> [root at localhost ~]# iptables -L
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> **********************************************************************
> **************************
>  MDT 132.66.176.211
>
> Last login: Mon Dec 24 11:51:57 2007 from dynamic136-91.tau.ac.il
> [root at x-math20 ~]# iptables -L Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
> **********************************************************************
> ***
>
> NODE 2 132.66.176.215
> Last login: Mon Dec 24 11:01:22 2007 from erezlab.tau.ac.il
> [root at x-mathr11 ~]# iptables -L
>
> Chain INPUT (policy ACCEPT)
> target     prot opt source               destination
> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
> Chain FORWARD (policy ACCEPT)
> target     prot opt source               destination
> RH-Firewall-1-INPUT  all  --  anywhere             anywhere
>
> Chain OUTPUT (policy ACCEPT)
> target     prot opt source               destination
>
> Chain RH-Firewall-1-INPUT (2 references)
> target     prot opt source               destination
> ACCEPT     all  --  anywhere             anywhere
> ACCEPT     icmp --  anywhere             anywhere            icmp any
> ACCEPT     ipv6-crypt--  anywhere             anywhere
> ACCEPT     ipv6-auth--  anywhere             anywhere
> ACCEPT     udp  --  anywhere             224.0.0.251         udp dpt:
> 5353
> ACCEPT     udp  --  anywhere             anywhere            udp
> dpt:ipp
> ACCEPT     all  --  anywhere             anywhere            state
> RELATED,ESTAB
> LISHED
> ACCEPT     tcp  --  anywhere             anywhere            state
> NEW tcp dpts:
> 30000:30101
> ACCEPT     tcp  --  anywhere             anywhere            state
> NEW tcp dpt:s
> sh
> ACCEPT     udp  --  anywhere             anywhere            state
> NEW udp dpt:a
> fs3-callback
> REJECT     all  --  anywhere             anywhere            reject-
> with icmp-ho
> st-prohibited
> [root at x-mathr11 ~]#
>
> ************************************************************
> one more thing....
> Do you use TCP protocol? or do you use UDP?
>
> Regards Avi,
> P.S I think a beginning of a beautiful friendship.. :-)
>
>
>
> On Dec 24, 2007 5:29 PM, Aaron Knister <aaron at iges.org> wrote: That
> sounds like quite a task! Could you show me the contents of your
> firewall rules on the systems mentioned below? (iptables -L) on each.
> That would help to diagnose the problem further.
>
> -Aaron
>
> On Dec 24, 2007, at 1:21 AM, Yan Benhammou wrote:
>
> > Hi Aaron and thank you for you fast answwers.
> > We are working (Avi,Meny and me) on the israeli GRID and we need to
> > create a single huge file system for this GRID.
> >     cheers
> >          Yan
> >
> > ________________________________
> >
> > From: Aaron Knister [mailto:aaron at iges.org]
> > Sent: Sun 12/23/2007 8:27 PM
> > To: Avi Gershon
> > Cc: lustre-discuss at clusterfs.com; Yan Benhammou; Meny Ben moshe
> > Subject: Re: [Lustre-discuss] help needed.
> >
> >
> > Can you check the firewall on each of those machines ( iptables -L )
> > and paste that here. Also, is this network dedicated to Lustre?
> > Lustre can easily saturate a network interface under load to the
> > point it becomes difficult to login to a node if it only has one
> > interface. I'd recommend using a different interface if you can.
> >
> > On Dec 23, 2007, at 11:03 AM, Avi Gershon wrote:
> >
> >
> >       node 1 132.66.176.212 < http://132.66.176.212/>
> >       node 2 132.66.176.215 < http://132.66.176.215/>
> >
> >       [root at x-math20 ~]# ssh 132.66.176.215 < http://
> 132.66.176.215/ >
> >       root at 132.66.176.215's password:
> >       ssh(21957) Permission denied, please try again.
> >       root at 132.66.176.215 's password:
> >       Last login: Sun Dec 23 14:32:51 2007 from x-math20.tau.ac.il
> <http://x-math20.tau.ac.il/
> > >
> >       [root at x-mathr11 ~]#  lctl ping 132.66.176.211 at tcp0
> >       failed to ping 132.66.176.211 at tcp: Input/output error
> >       [root at x-mathr11 ~]#  lctl list_nids
> >       132.66.176.215 at tcp
> >       [root at x-mathr11 ~]# ssh 132.66.176.212 <http://
> 132.66.176.212/>
> >       The authenticity of host ' 132.66.176.212
> > <http://132.66.176.212/
> >
> > ( 132.66.176.212 <http://132.66.176.212/> )' can't be established.
> >       RSA1 key fingerprint is
> 85:2a:c1:47:84:b7:b5:a6:cd:c4:57:86:af:ce:
> > 7e:74.
> >       Are you sure you want to continue connecting (yes/no)? yes
> >       ssh(11526) Warning: Permanently added ' 132.66.176.212 <
> > http://132.66.176.212/
> > > ' (RSA1) to the list of kno
> >       wn hosts.
> >       root at 132.66.176.212's password:
> >       Last login: Sun Dec 23 15:24:41 2007 from x-math20.tau.ac.il
> <http://x-math20.tau.ac.il/
> > >
> >       [root at localhost ~]# lctl ping 132.66.176.211 at tcp0
> >       failed to ping 132.66.176.211 at tcp: Input/output error
> >       [root at localhost ~]# lctl list_nids
> >       132.66.176.212 at tcp
> >       [root at localhost ~]#
> >
> >
> >       thanks for helping!!
> >       Avi
> >
> >
> >       On Dec 23, 2007 5:32 PM, Aaron Knister < aaron at iges.org>
> wrote:
> >
> >
> >               On the oss can you ping the mds/mgs using this
> command--
> >
> >               lctl ping 132.66.176.211 at tcp0
> >
> >               If it doesn't ping, list the nids on each node by
> running
> >
> >               lctl list_nids
> >
> >               and tell me what comes back.
> >
> >               -Aaron
> >
> >
> >               On Dec 23, 2007, at 9:22 AM, Avi Gershon wrote:
> >
> >
> >                       HI I could use some help.
> >                       I installed lustre on 3 computers
> >                        mdt/mgs :
> >
> >
> >
> **********************************************************************
> **************8
> >                       [root at x-math20 ~]#mkfs.lustre --reformat --
> fsname spfs --mdt --
> > mgs /dev/hdb
> >
> >                          Permanent disk data:
> >                       Target:     spfs-MDTffff
> >                       Index:      unassigned
> >                       Lustre FS:  spfs
> >                       Mount type: ldiskfs
> >                       Flags:      0x75
> >                                     (MDT MGS needs_index
> first_time update )
> >                       Persistent mount opts: errors=remount-
> ro,iopen_nopriv,user_xattr
> >                       Parameters:
> >
> >                       device size = 19092MB
> >                       formatting backing filesystem ldiskfs on /
> dev/hdb
> >                               target name  spfs-MDTffff
> >                               4k blocks     0
> >                               options        -J size=400 -i 4096 -
> I 512 -q -O dir_index
> > -F
> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-
> MDTffff  -J size=400 -i
> > 4096 -I 512 -q -O dir_index -F /dev/hdb
> >                       Writing CONFIGS/mountdata
> >                       [ root at x-math20 ~]# df
> >                       Filesystem           1K-blocks      Used
> Available Use% Mounted on
> >                       /dev/hda1             19228276   4855244
> 13396284  27% /
> >                       none                    127432         0
> 127432   0% /dev/shm
> >                       /dev/hdb              17105436    455152
> 15672728   3% /mnt/test/
> > mdt
> >                       [root at x-math20 ~]# cat /proc/fs/lustre/devices
> >                         0 UP mgs MGS MGS 5
> >                         1 UP mgc MGC132.66.176.211 at tcp
> > 5f5ba729-6412-3843-2229-1310a0b48f71 5
> >                         2 UP mdt MDS MDS_uuid 3
> >                         3 UP lov spfs-mdtlov spfs-mdtlov_UUID 4
> >                         4 UP mds spfs-MDT0000 spfs-MDT0000_UUID 3
> >                       [ root at x-math20 ~]#
> >
> *************************************************************end
> > mdt******************************8
> >                       so you can see that the MGS is up
> >                       ond on the ost's I get an error!! plz help...
> >
> >                       ost:
> >
> >
> **********************************************************************
> >                       [ root at x-mathr11 ~]# mkfs.lustre --reformat
> --fsname spfs --ost --
> > mgsnode=132.66. 176.211 at tcp0 /dev/hdb1
> >
> >                          Permanent disk data:
> >                       Target:     spfs-OSTffff
> >                       Index:      unassigned
> >                       Lustre FS:  spfs
> >                       Mount type: ldiskfs
> >                       Flags:      0x72
> >                                     (OST needs_index first_time
> update )
> >                       Persistent mount opts: errors=remount-
> ro,extents,mballoc
> >                       Parameters: mgsnode=132.66.176.211 at tcp
> >
> >                       device size = 19594MB
> >                       formatting backing filesystem ldiskfs on /
> dev/hdb1
> >                               target name  spfs-OSTffff
> >                               4k blocks     0
> >                               options        -J size=400 -i 16384 -
> I 256 -q -O
> > dir_index -F
> >                       mkfs_cmd = mkfs.ext2 -j -b 4096 -L spfs-
> OSTffff  -J size=400 -i
> > 16384 -I 256 -q -O dir_index -F /dev/hdb1
> >                       Writing CONFIGS/mountdata
> >                       [ root at x-mathr11 ~]# /CONFIGS/mountdata
> >                       -bash: /CONFIGS/mountdata: No such file or
> directory
> >                       [root at x-mathr11 ~]# mount -t lustre /dev/
> hdb1 /mnt/test/ost1
> >                       mount.lustre: mount /dev/hdb1 at /mnt/test/
> ost1 failed: Input/
> > output error
> >                       Is the MGS running?
> >
> ***********************************************end
> > ost********************************
> >
> >                       can any one point out the problem?
> >                       thanks Avi.
> >
> >
> >
> >
> _______________________________________________
> >                       Lustre-discuss mailing list
> >                       Lustre-discuss at clusterfs.com
> >
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >
> >
> >
> >
> >
> >               Aaron Knister
> >               Associate Systems Administrator/Web Designer
> >               Center for Research on Environment and Water
> >
> >               (301) 595-7001
> >               aaron at iges.org
> >
> >
> >
> >
> >
> >
> > Aaron Knister
> > Associate Systems Administrator/Web Designer
> > Center for Research on Environment and Water
> >
> > (301) 595-7001
> > aaron at iges.org
> >
> >
> >
>
> Aaron Knister
> Associate Systems Administrator/Web Designer
> Center for Research on Environment and Water
>
> (301) 595-7001
> aaron at iges.org
>
>
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Aaron Knister
Associate Systems Analyst
Center for Ocean-Land-Atmosphere Studies

(301) 595-7000
aaron at iges.org




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20080102/636e7553/attachment.html

------------------------------

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-discuss


End of Lustre-discuss Digest, Vol 24, Issue 2
*********************************************




More information about the lustre-discuss mailing list