Dear Dusty Marks,<br>                              Is the user with with which you are trying these commands present on MDS with same UID/GID. <br><br><div class="gmail_quote">On Sun, Feb 14, 2010 at 12:03 PM, Dusty Marks <span dir="ltr"><<a href="mailto:dustynmarks@gmail.com">dustynmarks@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I'm using Luster 1.8.2<br>

<br>

Everything seemed to be working quite nicely, until i enabled user quotas.<br>

<br>

I am able to mount the file system on the client, but when ever i cd<br>

into it, or ls it, or try anything else on it, it hangs. Then when i<br>

type in "lfs df -h", the MDS server longer appears in the list<br>

<br>

192.168.0.2 is the MDS/MGS server<br>

192.168.0.3 is the OST server (/dev/hdc is the oss)<br>

192.168.0.6 is the patchless client<br>

<br>

<br>

Thanks for the help all<br>

-Dusty<br>

<br>

<br>

This shows up in /var/log/messages on the client (sorry, the time is<br>

wrong on this machine)<br>

----------------------------------------------------------------------------------------------------------------------------------------------------------------<br>

Feb 13 18:15:49 mainframe2 kernel: Lustre: MGC192.168.0.2@tcp:<br>

Reactivating import<br>

Feb 13 18:15:49 mainframe2 kernel: Lustre: Client cluster-client has started<br>

Feb 13 18:16:21 mainframe2 kernel: Lustre:<br>

6386:0:(client.c:1434:ptlrpc_expire_one_request()) @@@ Request<br>

x1327583245569516 sent from cluster-MDT0000-mdc-ffff810009154c00 to<br>

NID 192.168.0.2@tcp 7s ago has timed out (7s prior to deadline).<br>

Feb 13 18:16:21 mainframe2 kernel:   req@ffff810036502c00<br>

x1327583245569516/t0 o101-><a href="mailto:cluster-MDT0000_UUID@192.168.0.2">cluster-MDT0000_UUID@192.168.0.2</a>@tcp:12/10<br>

lens 544/1064 e 0 to 1 dl 1266106581 ref 1 fl Rpc:/0/0 rc 0/0<br>

Feb 13 18:16:21 mainframe2 kernel: Lustre:<br>

cluster-MDT0000-mdc-ffff810009154c00: Connection to service<br>

cluster-MDT0000 via nid 192.168.0.2@tcp was lost; in progress<br>

operations using this service will wait for recovery to complete.<br>

Feb 13 18:16:27 mainframe2 kernel: LustreError:<br>

6386:0:(mdc_locks.c:625:mdc_enqueue()) ldlm_cli_enqueue: -4<br>

----------------------------------------------------------------------------------------------------------------------------------------------------------------<br>

<br>

<br>

This shows up in /var/log/messages on the MDS server<br>

----------------------------------------------------------------------------------------------------------------------------------------------------------------<br>

Feb 13 23:43:07 MDS kernel: Lustre: MGS: haven't heard from client<br>

d9029b94-c905-383b-b046-df9c7d7be59d (at 0@lo) in 248 seconds. I think<br>

it's dead, and I am evicting it.<br>

Feb 13 23:53:08 MDS kernel: LustreError:<br>

4121:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error<br>

(-43)  req@f6553600 x1327583245569136/t0<br>

o36->8b82793a-0c0a-06d5-220b-4e2bc0e85cdf@NET_0x20000c0a80006_UUID:0/0<br>

lens 424/360 e 0 to 0 dl 1266126794 ref 1 fl Interpret:/0/0 rc 0/0<br>

Feb 14 00:03:15 MDS kernel: LustreError:<br>

2581:0:(ldlm_lib.c:1848:target_send_reply_msg()) @@@ processing error<br>

(-43)  req@f5fb4800 x1327583245569252/t0<br>

o36->8b82793a-0c0a-06d5-220b-4e2bc0e85cdf@NET_0x20000c0a80006_UUID:0/0<br>

lens 424/360 e 0 to 0 dl 1266127401 ref 1 fl Interpret:/0/0 rc 0/0<br>

Feb 14 00:04:49 MDS kernel: LustreError: 11-0: an error occurred while<br>

communicating with 192.168.0.3@tcp. The ost_statfs operation failed<br>

with -107<br>

Feb 14 00:04:49 MDS kernel: Lustre: cluster-OST0000-osc: Connection to<br>

service cluster-OST0000 via nid 192.168.0.3@tcp was lost; in progress<br>

operations using this service will wait for recovery to complete.<br>

Feb 14 00:04:49 MDS kernel: LustreError: 167-0: This client was<br>

evicted by cluster-OST0000; in progress operations using this service<br>

will fail.<br>

Feb 14 00:04:49 MDS kernel: Lustre:<br>

4352:0:(quota_master.c:1711:mds_quota_recovery()) Only 0/1 OSTs are<br>

active, abort quota recovery<br>

Feb 14 00:04:49 MDS kernel: Lustre: cluster-OST0000-osc: Connection<br>

restored to service cluster-OST0000 using nid 192.168.0.3@tcp.<br>

Feb 14 00:04:49 MDS kernel: Lustre: MDS cluster-MDT0000:<br>

cluster-OST0000_UUID now active, resetting orphans<br>

Feb 14 00:04:56 MDS kernel: Lustre:<br>

4121:0:(ldlm_lib.c:540:target_handle_reconnect()) cluster-MDT0000:<br>

8b82793a-0c0a-06d5-220b-4e2bc0e85cdf reconnecting<br>

Feb 14 00:04:56 MDS kernel: Lustre:<br>

4121:0:(ldlm_lib.c:837:target_handle_connect()) cluster-MDT0000:<br>

refuse reconnection from<br>

<a href="mailto:8b82793a-0c0a-06d5-220b-4e2bc0e85cdf@192.168.0.6">8b82793a-0c0a-06d5-220b-4e2bc0e85cdf@192.168.0.6</a>@tcp to 0xc9b5f600;<br>

still busy with 1 active RPCs<br>

Feb 14 00:05:10 MDS kernel: Lustre:<br>

4121:0:(ldlm_lib.c:540:target_handle_reconnect()) cluster-MDT0000:<br>

8b82793a-0c0a-06d5-220b-4e2bc0e85cdf reconnecting<br>

Feb 14 00:05:10 MDS kernel: Lustre:<br>

4121:0:(ldlm_lib.c:540:target_handle_reconnect()) Skipped 1 previous<br>

similar message<br>

Feb 14 00:05:10 MDS kernel: Lustre:<br>

4121:0:(ldlm_lib.c:837:target_handle_connect()) cluster-MDT0000:<br>

refuse reconnection from<br>

<a href="mailto:8b82793a-0c0a-06d5-220b-4e2bc0e85cdf@192.168.0.6">8b82793a-0c0a-06d5-220b-4e2bc0e85cdf@192.168.0.6</a>@tcp to 0xc9b5f600;<br>

still busy with 1 active RPCs<br>

Feb 14 00:05:10 MDS kernel: Lustre:<br>

4121:0:(ldlm_lib.c:837:target_handle_connect()) Skipped 1 previous<br>

similar message<br>

Feb 14 00:12:11 MDS kernel: Lustre: cluster-MDT0000: haven't heard<br>

from client 8b82793a-0c0a-06d5-220b-4e2bc0e85cdf (at 192.168.0.6@tcp)<br>

in 258 seconds. I think it's dead, and I am evicting it.<br>

Feb 14 00:15:37 MDS kernel: LustreError: 11-0: an error occurred while<br>

communicating with 192.168.0.3@tcp. The ost_quotactl operation failed<br>

with -107<br>

Feb 14 00:15:37 MDS kernel: Lustre: cluster-OST0000-osc: Connection to<br>

service cluster-OST0000 via nid 192.168.0.3@tcp was lost; in progress<br>

operations using this service will wait for recovery to complete.<br>

Feb 14 00:15:37 MDS kernel: LustreError:<br>

4357:0:(quota_ctl.c:379:client_quota_ctl()) ptlrpc_queue_wait failed,<br>

rc: -107<br>

Feb 14 00:15:37 MDS kernel: LustreError: 167-0: This client was<br>

evicted by cluster-OST0000; in progress operations using this service<br>

will fail.<br>

Feb 14 00:15:37 MDS kernel: Lustre:<br>

4358:0:(quota_master.c:1711:mds_quota_recovery()) Only 0/1 OSTs are<br>

active, abort quota recovery<br>

Feb 14 00:15:37 MDS kernel: Lustre: cluster-OST0000-osc: Connection<br>

restored to service cluster-OST0000 using nid 192.168.0.3@tcp.<br>

Feb 14 00:15:37 MDS kernel: Lustre: MDS cluster-MDT0000:<br>

cluster-OST0000_UUID now active, resetting orphans<br>

----------------------------------------------------------------------------------------------------------------------------------------------------------------<br>

<br>

--<br>

The graduate with a Science degree asks, "Why does it work?" The<br>

graduate with an Engineering degree asks, "How does it work?" The<br>

graduate with an Accounting degree asks, "How much will it cost?" The<br>

graduate with an Arts degree asks, "Do you want fries with that?"<br>

_______________________________________________<br>

Lustre-discuss mailing list<br>

<a href="mailto:Lustre-discuss@lists.lustre.org">Lustre-discuss@lists.lustre.org</a><br>

<a href="http://lists.lustre.org/mailman/listinfo/lustre-discuss" target="_blank">http://lists.lustre.org/mailman/listinfo/lustre-discuss</a><br>

</blockquote></div><br><br clear="all"><br>-- <br>Regards--<br>Rishi Pathak<br>National PARAM Supercomputing Facility<br>Center for Development of Advanced Computing(C-DAC)<br>Pune University Campus,Ganesh Khind Road<br>

Pune-Maharastra<br>