[Lustre-discuss] ll_ost_creat_* goes bersek (100% cpu used - OST disabled)

Alexey Lyashkov alexey.lyashkov at clusterstor.com
Fri Aug 13 11:15:51 PDT 2010


On Aug 13, 2010, at 20:49, Adrian Ulrich wrote:

> Hi,
> 
> Since a few hours we have a problem with one of our OSTs:
> 
> One (and only one) ll_ost_create_ process on one of the OSTs
> seems to go crazy and uses 100% CPU.
> 
> Rebooting the OST + MDS didn't help and there isn't much
> going on on the filesystem itself:
> 
> - /proc/fs/lustre/ost/OSS/ost_create/stats is almost 'static'
> - iostat shows almost no usage
> - ib traffic is < 100 kb/s
> 
> 
> The MDS logs this each ~3 minutes:
> Aug 13 19:11:14 mds1 kernel: LustreError: 11-0: an error occurred while communicating with 10.201.62.23 at o2ib. The ost_connect operation failed with -16
> ..and later:
> Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(osc_create.c:390:osc_create()) lustre1-OST0005-osc: oscc recovery failed: -110
> Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(lov_obd.c:1129:lov_clear_orphans()) error in orphan recovery on OST idx 5/32: rc = -110
> Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1022:__mds_lov_synchronize()) lustre1-OST0005_UUID failed at mds_lov_clear_orphans: -110
> Aug 13 19:17:16 mds1 kernel: LustreError: 10253:0:(mds_lov.c:1031:__mds_lov_synchronize()) lustre1-OST0005_UUID sync failed -110, deactivating
> Aug 13 19:17:54 mds1 kernel: Lustre: 6544:0:(import.c:508:import_select_connection()) lustre1-OST0005-osc: tried all connections, increasing latency to 51s
> 
-110 = -ETIMEOUT, operation don't finished before deadline, or network problem.

> oops! (lustre1-OST0005 is hosted on the OSS with the crazy ll_ost_create process)
ll_ost_create work on destroy old created objects, i think.



> 
> On the affected OSS we get
> Lustre: 11764:0:(ldlm_lib.c:835:target_handle_connect()) lustre1-OST0005: refuse reconnection from lustre1-mdtlov_UUID at 10.201.62.11@o2ib to 0xffff8102164d0200; still busy with 2 active RPCs
> 
> 
> $ llog_reader lustre-log.1281718692.11833 shows:
Llog_reader is tool to read configuration llog, if you want decode debug log, you should use lctl df $file > $output

> 
> And we get tons of soft-cpu lockups :-/
> 
> Any ideas?
please post soft-lookup report. one of possibility, MDS ask too many objects to create on that OST or OST have too many reconnects.

> 
> 
> Regards,
> Adrian
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss




More information about the lustre-discuss mailing list