[Lustre-discuss] possible quota problem

Sun Feb 14 22:36:31 PST 2010

Dear Dusty Marks,
                              Is the user with with which you are trying
these commands present on MDS with same UID/GID.

On Sun, Feb 14, 2010 at 12:03 PM, Dusty Marks <dustynmarks at gmail.com> wrote:

> I'm using Luster 1.8.2
>
> Everything seemed to be working quite nicely, until i enabled user quotas.
>
> I am able to mount the file system on the client, but when ever i cd
> into it, or ls it, or try anything else on it, it hangs. Then when i
> type in "lfs df -h", the MDS server longer appears in the list
>
> 192.168.0.2 is the MDS/MGS server
> 192.168.0.3 is the OST server (/dev/hdc is the oss)
> 192.168.0.6 is the patchless client
>
>
> Thanks for the help all
> -Dusty
>
>
> This shows up in /var/log/messages on the client (sorry, the time is
> wrong on this machine)
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Feb 13 18:15:49 mainframe2 kernel: Lustre: MGC192.168.0.2 at tcp:
> Reactivating import
> Feb 13 18:15:49 mainframe2 kernel: Lustre: Client cluster-client has
> started
> Feb 13 18:16:21 mainframe2 kernel: Lustre:
> 6386:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request
> x1327583245569516 sent from cluster-MDT0000-mdc-ffff810009154c00 to
> NID 192.168.0.2 at tcp 7s ago has timed out (7s prior to deadline).
> Feb 13 18:16:21 mainframe2 kernel:   req at ffff810036502c00
> x1327583245569516/t0 o101->cluster-MDT0000_UUID at 192.168.0.2@tcp:12/10
> lens 544/1064 e 0 to 1 dl 1266106581 ref 1 fl Rpc:/0/0 rc 0/0
> Feb 13 18:16:21 mainframe2 kernel: Lustre:
> cluster-MDT0000-mdc-ffff810009154c00: Connection to service
> cluster-MDT0000 via nid 192.168.0.2 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
> Feb 13 18:16:27 mainframe2 kernel: LustreError:
> 6386:0:(mdc_locks.c:625:mdc_enqueue()) ldlm_cli_enqueue: -4
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> This shows up in /var/log/messages on the MDS server
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Feb 13 23:43:07 MDS kernel: Lustre: MGS: haven't heard from client
> d9029b94-c905-383b-b046-df9c7d7be59d (at 0 at lo) in 248 seconds. I think
> it's dead, and I am evicting it.
> Feb 13 23:53:08 MDS kernel: LustreError:
> 4121:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error
> (-43)  req at f6553600 x1327583245569136/t0
> o36->8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at NET_0x20000c0a80006_UUID:0/0
> lens 424/360 e 0 to 0 dl 1266126794 ref 1 fl Interpret:/0/0 rc 0/0
> Feb 14 00:03:15 MDS kernel: LustreError:
> 2581:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error
> (-43)  req at f5fb4800 x1327583245569252/t0
> o36->8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at NET_0x20000c0a80006_UUID:0/0
> lens 424/360 e 0 to 0 dl 1266127401 ref 1 fl Interpret:/0/0 rc 0/0
> Feb 14 00:04:49 MDS kernel: LustreError: 11-0: an error occurred while
> communicating with 192.168.0.3 at tcp. The ost_statfs operation failed
> with -107
> Feb 14 00:04:49 MDS kernel: Lustre: cluster-OST0000-osc: Connection to
> service cluster-OST0000 via nid 192.168.0.3 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
> Feb 14 00:04:49 MDS kernel: LustreError: 167-0: This client was
> evicted by cluster-OST0000; in progress operations using this service
> will fail.
> Feb 14 00:04:49 MDS kernel: Lustre:
> 4352:0:(quota_master.c:1711:mds_quota_recovery()) Only 0/1 OSTs are
> active, abort quota recovery
> Feb 14 00:04:49 MDS kernel: Lustre: cluster-OST0000-osc: Connection
> restored to service cluster-OST0000 using nid 192.168.0.3 at tcp.
> Feb 14 00:04:49 MDS kernel: Lustre: MDS cluster-MDT0000:
> cluster-OST0000_UUID now active, resetting orphans
> Feb 14 00:04:56 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:540:target_handle_reconnect()) cluster-MDT0000:
> 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf reconnecting
> Feb 14 00:04:56 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:837:target_handle_connect()) cluster-MDT0000:
> refuse reconnection from
> 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at 192.168.0.6@tcp to 0xc9b5f600;
> still busy with 1 active RPCs
> Feb 14 00:05:10 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:540:target_handle_reconnect()) cluster-MDT0000:
> 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf reconnecting
> Feb 14 00:05:10 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:540:target_handle_reconnect()) Skipped 1 previous
> similar message
> Feb 14 00:05:10 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:837:target_handle_connect()) cluster-MDT0000:
> refuse reconnection from
> 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf at 192.168.0.6@tcp to 0xc9b5f600;
> still busy with 1 active RPCs
> Feb 14 00:05:10 MDS kernel: Lustre:
> 4121:0:(ldlm_lib.c:837:target_handle_connect()) Skipped 1 previous
> similar message
> Feb 14 00:12:11 MDS kernel: Lustre: cluster-MDT0000: haven't heard
> from client 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf (at 192.168.0.6 at tcp)
> in 258 seconds. I think it's dead, and I am evicting it.
> Feb 14 00:15:37 MDS kernel: LustreError: 11-0: an error occurred while
> communicating with 192.168.0.3 at tcp. The ost_quotactl operation failed
> with -107
> Feb 14 00:15:37 MDS kernel: Lustre: cluster-OST0000-osc: Connection to
> service cluster-OST0000 via nid 192.168.0.3 at tcp was lost; in progress
> operations using this service will wait for recovery to complete.
> Feb 14 00:15:37 MDS kernel: LustreError:
> 4357:0:(quota_ctl.c:379:client_quota_ctl()) ptlrpc_queue_wait failed,
> rc: -107
> Feb 14 00:15:37 MDS kernel: LustreError: 167-0: This client was
> evicted by cluster-OST0000; in progress operations using this service
> will fail.
> Feb 14 00:15:37 MDS kernel: Lustre:
> 4358:0:(quota_master.c:1711:mds_quota_recovery()) Only 0/1 OSTs are
> active, abort quota recovery
> Feb 14 00:15:37 MDS kernel: Lustre: cluster-OST0000-osc: Connection
> restored to service cluster-OST0000 using nid 192.168.0.3 at tcp.
> Feb 14 00:15:37 MDS kernel: Lustre: MDS cluster-MDT0000:
> cluster-OST0000_UUID now active, resetting orphans
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> --
> The graduate with a Science degree asks, "Why does it work?" The
> graduate with an Engineering degree asks, "How does it work?" The
> graduate with an Accounting degree asks, "How much will it cost?" The
> graduate with an Arts degree asks, "Do you want fries with that?"
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

-- 
Regards--
Rishi Pathak
National PARAM Supercomputing Facility
Center for Development of Advanced Computing(C-DAC)
Pune University Campus,Ganesh Khind Road
Pune-Maharastra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20100215/dc2f21b9/attachment.htm>