[lustre-discuss] Directory striping not working in Lustre 2.8?

Juan PC piernas at ditec.um.es
Tue May 17 04:36:22 PDT 2016


Hi Patrick,

El 16/05/16 a las 18:16, Patrick Farrell escribió:
> Juan,
> 
> Well, first step is I'll volunteer to take a quick look at that file. :)
>
Perfect! :-)

> If that thread is representative (and your soft lockup messages suggest
> it is), you're stuck in the ldiskfs file system and journaling code. 
> It's possible there's a Lustre bug there, but (in my opinion) it's more
> likely that means something is up with your underlying storage.  That
> stack trace shows the Lustre thread waiting for a pretty fundamental low
> level journal operation, which suggests that the disks are not
> responsive.  They might just be too slow to service the requests, or
> they might be misconfigured somehow.
> 
When I first saw the soft lockup, I was looking for information and I
found many comments about hardware problems. However, I think this is
not the case because I am testing other file systems with the same
workloads, and the problem does not appear if I use the corresponding
vanilla CentOS kernel. Moreover, I use the same SSD devices for Lustre
and the other file systems, and I have not seen any problem so far.

> Are all the MDTs sharing the same underlying storage?
> 
No, each MDT runs on a separate computer with a dedicated SSD device.


> - Patrick
> 
Regards,

	Juan

PS: I will send you the log files in a different e-mail.

> On 05/16/2016 10:28 AM, Juan PC wrote:
>> Hi Patrick,
>>
>> If you mean something like the attached file, yes, I can collect that
>> information from the different servers. After that, what should I do?
>>
>> Thanks,
>>
>>     Juan
>>
>> El 16/05/16 a las 15:10, Patrick Farrell escribió:
>>> Juan,
>>>
>>> A first thought is that you've probably hit a bug, but also that more
>>> information would be good - For starters, do you have stack traces
>>> for any of these stuck threads?
>>>
>>> - Patrick
>>> ________________________________________
>>> From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on
>>> behalf of Juan PC <piernas at ditec.um.es>
>>> Sent: Monday, May 16, 2016 3:33:23 AM
>>> To: lustre-discuss at lists.lustre.org
>>> Subject: [lustre-discuss] Directory striping not working in Lustre 2.8?
>>>
>>> Hi,
>>>
>>> I have put MDTs and OSTs on different servers (either one MDT or one OST
>>> per server), and Lustre becomes unresponsive where there are more than
>>> one MDT, directories are striped, and workload is metadata-intensive.
>>> Error messages in one of the MDTs (which is also the MGS) are:
>>>
>>>   BUG: soft lockup - CPU#1 stuck for 22s! [ptlrpc_hr01_000:2382]
>>>   BUG: soft lockup - CPU#1 stuck for 23s! [ptlrpc_hr01_000:2382]
>>>   BUG: soft lockup - CPU#3 stuck for 22s! [ptlrpc_hr01_002:2384]
>>>   BUG: soft lockup - CPU#3 stuck for 23s! [ptlrpc_hr01_002:2384]
>>>   BUG: soft lockup - CPU#4 stuck for 22s! [jbd2/sdb1-8:4498]
>>>   BUG: soft lockup - CPU#4 stuck for 23s! [jbd2/sdb1-8:4498]
>>>   BUG: soft lockup - CPU#5 stuck for 22s! [ptlrpc_hr01_001:2383]
>>>   BUG: soft lockup - CPU#5 stuck for 23s! [ptlrpc_hr01_001:2383]
>>>   BUG: soft lockup - CPU#6 stuck for 22s! [jbd2/sdb1-8:4225]
>>>   BUG: soft lockup - CPU#6 stuck for 22s! [jbd2/sdb1-8:4498]
>>>   BUG: soft lockup - CPU#6 stuck for 23s! [jbd2/sdb1-8:4498]
>>>   BUG: soft lockup - CPU#7 stuck for 23s! [ptlrpc_hr01_003:2385]
>>>
>>> In the other MDT, error messages are similar:
>>>
>>>   BUG: soft lockup - CPU#6 stuck for 22s! [jbd2/sdb1-8:4421]
>>>   BUG: soft lockup - CPU#6 stuck for 23s! [jbd2/sdb1-8:4421]
>>>   BUG: soft lockup - CPU#7 stuck for 22s! [jbd2/sdb1-8:4421]
>>>   BUG: soft lockup - CPU#7 stuck for 23s! [jbd2/sdb1-8:4421]
>>>
>>> Workload is generated by scenarios 9, 10, 11 and 12 of the HPCS-IO
>>> suite, which stat thousands of empty files.
>>>
>>> With a single MDT, and one or more OSTs, no problem appears.
>>>
>>> Any ideas or things I can try?
>>>
>>> Regards,
>>>
>>>          Juan
>>>
>>> El 11/05/16 a las 12:19, Juan PC escribió:
>>>> Hello Olaf:
>>>>
>>>> Thanks for replying. In my setup, each MDS also has a single MDT, and
>>>> each OSS has a single OST too. The difference, maybe, is that I have a
>>>> MDT and an OST running in the same server, and this seems to cause some
>>>> problems. I will try other configurations.
>>>>
>>>> Regards,
>>>>
>>>>        Juan
>>>>
>>>> El 10/05/16 a las 20:41, Faaland, Olaf P. escribió:
>>>>> Hello Juan,
>>>>>
>>>>> No, I haven't seen the problem you describe.  Our testing
>>>>> configuration has only a single target per server - each MDS has a
>>>>> single MDT, and each OSS has a single OST.
>>>>>
>>>>> We've encountered some issues, but so far stability has been good. 
>>>>> Our testing has been on relatively small scale, though.
>>>>>
>>>>> Olaf P. Faaland
>>>>> Livermore Computing
>>>>>
>>>>> ________________________________________
>>>>> From: Juan PC [piernas at ditec.um.es]
>>>>> Sent: Monday, May 09, 2016 4:25 AM
>>>>> To: Faaland, Olaf P.; lustre-discuss at lists.lustre.org
>>>>> Subject: Re: [lustre-discuss] default directory striping with
>>>>> Lustre 2.8
>>>>>
>>>>> Dear Olaf,
>>>>>
>>>>> I would like to hear about your experience with stripped
>>>>> directories in
>>>>> Lustre 2.8. Mine is that this feature is still unstable and a lot of
>>>>> "BUG: soft lockup..." errors start appearing. Maybe the problem is the
>>>>> setup that I use: as many OSTs as MDTs, with a pair OST-MDT per
>>>>> server.
>>>>>
>>>>> Have you faced the same problem?
>>>>>
>>>>> Regards,
>>>>>
>>>>>          Juan
>>>>>
>>>>>
>>>>> El 05/05/16 a las 01:04, Faaland, Olaf P. escribió:
>>>>>> Hi,
>>>>>>
>>>>>> Suppose you have m MDTs in your filesystem, and create a new
>>>>>> directory
>>>>>> and set default directory striping using
>>>>>>
>>>>>> lfs mkdir --count=c --index=k <path> && lfs setdirstripe --default
>>>>>> --count=c <path>
>>>>>>
>>>>>> Suppose that c < m and m > 2.
>>>>>>
>>>>>> Then you make subdirectories, like
>>>>>>
>>>>>> mkdir <path>/child.{1,2,3,...}
>>>>>>
>>>>>> a) By design, do the child directories have the same starting
>>>>>> index as
>>>>>> <path>?
>>>>>> b) By design, are the child directories all striped across the
>>>>>> same set
>>>>>> of MDTs as <path>?
>>>>>>
>>>>>> I didn't see that specified one way or the other in the DNE phase
>>>>>> 2 high
>>>>>> level design document at
>>>>>> http://wiki.opensfs.org/DNE_StripedDirectories_HighLevelDesign_wiki_version.
>>>>>>
>>>>>> If I should look elsewhere, let me know.
>>>>>>
>>>>>> In a test I was doing today, I noticed that neither (a) nor (b) were
>>>>>> true in practice.  I'm wondering whether that's a bug or a feature.
>>>>>> Here's partial output from my test.
>>>>>>
>>>>>> $ lfs mkdir --count=6 --index=2 /p/lustre/faaland1/count6_index2
>>>>>> $ lfs setdirstripe -D --count=6 /p/lustre/faaland1/count6_index2
>>>>>> $ mkdir
>>>>>> /p/lustre/faaland1/count6_index2/subdir.{1,2,3,4,5,6,7,8,9,10,11,12,13,14}
>>>>>>
>>>>>> $ lfs getdirstripe /p/lustre/faaland1/count6_index2
>>>>>> /p/lustre/faaland1/count6_index2
>>>>>> lmv_stripe_count: 6 lmv_stripe_offset: 2
>>>>>> mdtidx           FID[seq:oid:ver]
>>>>>>       2           [0x280000400:0x33f3:0x0]
>>>>>>       3           [0x2c0000404:0x33f3:0x0]
>>>>>>       4           [0x300000402:0x33f2:0x0]
>>>>>>       5           [0x340000407:0x33f1:0x0]
>>>>>>       6           [0x380000406:0x33f0:0x0]
>>>>>>       7           [0x3c0000404:0x33ef:0x0]
>>>>>> /p/lustre/faaland1/count6_index2/subdir.4
>>>>>> lmv_stripe_count: 6 lmv_stripe_offset: 2
>>>>>> mdtidx           FID[seq:oid:ver]
>>>>>>       2           [0x280000400:0x33f5:0x0]
>>>>>>       3           [0x2c0000404:0x33f5:0x0]
>>>>>>       4           [0x300000402:0x33f4:0x0]
>>>>>>       5           [0x340000407:0x33f3:0x0]
>>>>>>       6           [0x380000406:0x33f2:0x0]
>>>>>>       7           [0x3c0000404:0x33f1:0x0]
>>>>>> /p/lustre/faaland1/count6_index2/subdir.9
>>>>>> lmv_stripe_count: 6 lmv_stripe_offset: 5
>>>>>> mdtidx           FID[seq:oid:ver]
>>>>>>       5           [0x340000400:0x37a1:0x0]
>>>>>>       6           [0x380000405:0x37a1:0x0]
>>>>>>       7           [0x3c0000402:0x37a0:0x0]
>>>>>>       8           [0x40000040e:0x379f:0x0]
>>>>>>       9           [0x440000403:0x379e:0x0]
>>>>>>       0           [0x200000405:0x379d:0x0]
>>>>>> /p/lustre/faaland1/count6_index2/subdir.3
>>>>>> lmv_stripe_count: 6 lmv_stripe_offset: 5
>>>>>> mdtidx           FID[seq:oid:ver]
>>>>>>       5           [0x340000400:0x37a0:0x0]
>>>>>>       6           [0x380000405:0x37a0:0x0]
>>>>>>       7           [0x3c0000402:0x379f:0x0]
>>>>>>       8           [0x40000040e:0x379e:0x0]
>>>>>>       9           [0x440000403:0x379d:0x0]
>>>>>>       0           [0x200000405:0x379c:0x0]
>>>>>> /p/lustre/faaland1/count6_index2/subdir.14
>>>>>> lmv_stripe_count: 6 lmv_stripe_offset: 7
>>>>>> mdtidx           FID[seq:oid:ver]
>>>>>>       7           [0x3c0000400:0x30d4:0x0]
>>>>>>       8           [0x400000403:0x30d4:0x0]
>>>>>>       9           [0x440000405:0x30d3:0x0]
>>>>>>       0           [0x200000407:0x30d2:0x0]
>>>>>>       1           [0x240000407:0x30d1:0x0]
>>>>>>       2           [0x280000407:0x30d0:0x0]
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> Olaf P. Faaland
>>>>>> Livermore Computing
>>>>>> phone : 925-422-2263
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> lustre-discuss mailing list
>>>>>> lustre-discuss at lists.lustre.org
>>>>>> http://secure-web.cisco.com/1b6O1t0NM65SMDLd0VNnGkO8_2o0mWu7q_SotZhqVF-Z73mleDDzd45s2PwtZKJUD3Ztk-yckmuoGMq16E0Fw-LUq6hUtZjrPDpumyab9NGBeDZW21yP4CD9mliUhSCZwbhRAGnVnx6TJbxY-lfFJNAdY6Pdfv1hIx4bNQDu5cYl-aIXJWj15Y_Lum0LphN60mITae7Uf-w6xbGKd5g1bEyUK5SHoQ-NIJdGLF215kUypB_BB-96iWRj3tsoPdUp_ClSF7fBDItzoBV5jgOeHam7Ne6ztUVj7zqKJ8wQEEwJAC9CV_u7N7rLrxVaOVuFO2drWjpFe8tnmgn8tlPCjaH8Q4Y85Unr_I3dcOI4xZyo/http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org
>>>>>>
>>>>>>
>>>>>
>>>>> -- 
>>>>> D. Juan Piernas Cánovas
>>>>> Departamento de Ingeniería y Tecnología de Computadores
>>>>> Facultad de Informática. Universidad de Murcia
>>>>> Campus de Espinardo - 30080 Murcia (SPAIN)
>>>>> Tel.: +34868887657    Fax: +34868884151
>>>>> email: piernas at ditec.um.es
>>>>> PGP public key:
>>>>> http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
>>>>>
>>>>>
>>>>> *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
>>>>> PostScript :-) ***
>>>>>
>>>>
>>>
>>> -- 
>>> D. Juan Piernas Cánovas
>>> Departamento de Ingeniería y Tecnología de Computadores
>>> Facultad de Informática. Universidad de Murcia
>>> Campus de Espinardo - 30080 Murcia (SPAIN)
>>> Tel.: +34868887657    Fax: +34868884151
>>> email: piernas at ditec.um.es
>>> PGP public key:
>>> http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
>>>
>>>
>>> *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
>>> PostScript :-) ***
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>


-- 
D. Juan Piernas Cánovas
Departamento de Ingeniería y Tecnología de Computadores
Facultad de Informática. Universidad de Murcia
Campus de Espinardo - 30080 Murcia (SPAIN)
Tel.: +34868887657    Fax: +34868884151
email: piernas at ditec.um.es
PGP public key:
http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index

*** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
PostScript :-) ***


More information about the lustre-discuss mailing list