[lustre-discuss] MDT restart: WAITING non-ready MDTs

Thomas Roth t.roth at gsi.de
Mon Jan 20 05:33:58 PST 2020


As is to be expected, MDT no. 2 did not like the situation either:

:~# cat /proc/fs/lustre/mdt/hebe-MDT0002/recovery_status
status: WAITING
non-ready MDTs:  0001
recovery_start: 1579525859
time_waited: 23


I was already reading LU-9748 and chewing my nails about an ad-hoc upgrade (this is a Lustre 2.10.6 
system), when MDT 1 finally relented, obviously getting the necessary logs now that MDT 2 had been 
back and finished its recovery.
Then, of course, MDT 2 also recovered.


In such a situation, would 'lctl abort recovery' help?
Or shutting down all three servers and then restarting 0 - 1 - 2 ?

Regrads,
Thomas


On 20/01/2020 14.00, Thomas Roth wrote:
> Hi all,
> 
> I had to restart our MDTs 1 and 2.
> No.2 is still doing a file system check, no. 1 is mounted again and should be in recovery, however:
> 
> :~# cat recovery_status
> status: WAITING
> non-ready MDTs:  0002
> recovery_start: 1579524336
> time_waited: 538
> 
> 
> Seem I have misunderstood the organisation of multiple MDTs: I thought they were independent of each 
> other - execept that MDT 0 has the root of the filesystem, of course.
> 
> But the others, waiting for everybody to be online?
> 
> 
> Regards,
> Thomas
> 
> 
> 

-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz



More information about the lustre-discuss mailing list