[Lustre-discuss] qmaster dead when an ost is umounted

Patricia Santos Marco psantos at bifi.es
Fri Aug 21 05:32:15 PDT 2009


He, we have upgraded to lustre 1.8.1 successfully. But we have detected that
when a ost is umounted, the quotas failed and when the ost is mounted again
the qmaster recovery fails and the quotas are off.

Aug 21 11:56:20 lxsrv4 kernel: LustreError: 11-0: an error occurred while
communicating with 192.168.1.249 at tcp. The obd_ping operation failed with
-107
Aug 21 11:56:20 lxsrv4 kernel: Lustre: luster-OST0000-osc: Connection to
service luster-OST0000 via nid 192.168.1.249 at tcp waslost; in progress
operations using this service will wait for recovery to complete.
Aug 21 11:56:45 lxsrv4 kernel: Lustre:
4752:0:(import.c:508:import_select_connection()) luster-OST0000-osc: tried
all connections, increasing latency to 6s
Aug 21 11:56:45 lxsrv4 kernel: Lustre:
4752:0:(import.c:508:import_select_connection()) Skipped 11 previous similar
messages
Aug 21 11:57:10 lxsrv4 kernel: Lustre:
4752:0:(import.c:508:import_select_connection()) luster-OST0000-osc: tried
all connections, increasing latency to 11s
Aug 21 11:58:00 lxsrv4 kernel: Lustre:
4752:0:(import.c:508:import_select_connection()) luster-OST0000-osc: tried
all connections, increasing latency to 21s
Aug 21 11:58:00 lxsrv4 kernel: Lustre:
4752:0:(import.c:508:import_select_connection()) Skipped 1 previous similar
message
Aug 21 11:58:06 lxsrv4 kernel: Lustre: luster-OST0000-osc: Connection
restored to service luster-OST0000 using nid 192.168.1.249 at tcp.
Aug 21 11:58:06 lxsrv4 kernel: Lustre: Skipped 5 previous similar messages
Aug 21 11:58:06 lxsrv4 kernel: LustreError:
9741:0:(quota_ctl.c:373:client_quota_ctl()) ptlrpc_queue_wait failed, rc: -3
Aug 21 11:58:06 lxsrv4 kernel: LustreError:
9741:0:(quota_ctl.c:373:client_quota_ctl()) Skipped 5 previous similar
messages
Aug 21 11:58:06 lxsrv4 kernel: LustreError:
9741:0:(quota_master.c:1686:qmaster_recovery_main()) qmaster recovery
failed! (id:1047 type:0 rc:-3)


The command "lfs quotaon" fails:

terminus:~ # lfs quotaon -ug /lustre
quotaon failed: Device or resource busy

 we must to run "lfs quotacheck", this takes a lot of time and it fails too:

terminus:~ # lfs quotacheck -ug /lustre
quotacheck failed: Device or resource busy


Is there another command to reactivate quotas without disconnecting the
clients? What's the reason for this failure?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20090821/781ffead/attachment.htm>


More information about the lustre-discuss mailing list