[lustre-discuss] hanging threads

vaibhav pol vaibhav4947 at gmail.com
Mon Dec 18 02:36:27 PST 2023


 iotop can be used to debug the I/O performance.  lfs health_check , lctl
get_param to get lustre health status.

*cratch-OST0084_UUID: not available for connect from 172.23.15.246 at tcp30
(no target)   indicates the network issue  check network as well. *
verify the  health of the storage devices associated with OST00_036 can use
smartctl.



On Mon, 18 Dec 2023 at 15:28, Strikwerda, Ger via lustre-discuss <
lustre-discuss at lists.lustre.org> wrote:

>
> Dear all,
>
> Since last week we are facing 'hanging kernel threads' causing our Lustre
> environment (Rocky 8.7/Lustre 2.15.2) to hang.
>
> errors:
>
> Dec 18 10:36:04 hb-oss01 kernel: LustreError: 137-5: scratch-OST0084_UUID:
> not available for connect from 172.23.15.246 at tcp30 (no target). If you
> are running an HA pair check that the target is mounted on the other server.
> Dec 18 10:36:04 hb-oss01 kernel: LustreError: Skipped 330 previous similar
> messages
> Dec 18 10:36:04 hb-oss01 kernel: ptlrpc_watchdog_fire: 1 callbacks
> suppressed
> Dec 18 10:36:04 hb-oss01 kernel: Lustre: ll_ost00_036: service thread pid
> 85609 was inactive for 1062.652 seconds. The thread might be hung, or it
> might only be slow and will resume later. Dumping the stack trace for
> debugging purposes:
>
> at that moment 231 jobs, not really high io. Normally we run way more
> jobs, and way more io.
>
> environment is
>
> 2 MDS
> 4 OSS
> 160 OST's
> 250 clients
>
> network is tcp
>
> According to the internet, this could be caused by 'bad i/o'. Are there
> any useful things to check/isolate where this bad i/o is coming from? How
> do others pinpoint these issues?
>
> Any feedback is very welcome,
>
> --
>
> Vriendelijke groet,
>
> Ger Strikwerdasenior expert multidisciplinary enabler
> simple solution architect
> Rijksuniversiteit Groningen
> CIT/RDMS/HPC
>
> Smitsborg
> Nettelbosje 1
> 9747 AJ Groningen
> Tel. 050 363 9276
> "God is hard, God is fair
>  some men he gave brains, others he gave hair"
>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20231218/ecb69c08/attachment.htm>


More information about the lustre-discuss mailing list