[Lustre-discuss] OST acting up

Ron Croonenberg ronc at lanl.gov
Mon Nov 24 07:34:45 PST 2014


Actually I founf out that the cause of the problem was different than I 
first expected.  Instead of ldiskfs I am using ZFS. The 
targets/LUNs/pools have multiple paths to them and ZFS doesn't seem to 
particularly like that. Bsically I avoided the problem by creating the 
pools in the mapper directory and also I import them (the OSSs are 
diskless) from that same directory every time. That way the pools are 
not corrupted and Lustre is a lot happier it seems (so far)

Ron

On 11/13/2014 03:40 PM, Fernando Pérez wrote:
> Hi Ron.
>
> I think that it will be the solution to your problem.
>
> I have had this kind of problems always reformatting OSTs and sometimes adding a new one, and restart lustre, regenerate the logs, always solves them.
>
> Good luck.
>
> =============================================
> Fernando Pérez
> Institut de Ciències del Mar (CMIMA-CSIC)
> Departament Oceanografía Física i Tecnològica
> Passeig Marítim de la Barceloneta,37-49
> 08003 Barcelona
> Phone:  (+34) 93 230 96 35
> =============================================
>
>> El 13/11/2014, a las 18:34, Ron Croonenberg <rocr at lanl.gov> escribió:
>>
>> Hi Fernando,
>>
>> It looks like I had to clear the logs on the MDT but also all OSTs apparently.  There was nothing wrong with OST03, it just didn't see it it looked like
>>
>>
>> Ron
>>
>> On 11/13/2014 10:29 AM, Fernando Pérez wrote:
>>> I had the same problem in the past with lustre 2.4.2 when I change one ost due to a hardware problem.
>>>
>>> Stop lustre after reformat the new ost (unmount clients, stop mgs/mds, unmount osts) and start all again was the only way that solved this problem for me.
>>>
>>> Regards.
>>>
>>> =============================================
>>> Fernando Pérez
>>> Institut de Ciències del Mar (CMIMA-CSIC)
>>> Departament Oceanografía Física i Tecnològica
>>> Passeig Marítim de la Barceloneta,37-49
>>> 08003 Barcelona
>>> Phone:  (+34) 93 230 96 35
>>> =============================================
>>>
>>>> El 13/11/2014, a las 17:33, Ron Croonenberg <ronc at lanl.gov> escribió:
>>>>
>>>> Actually that OSS has 4 OSTs  (I'll check the logs again for some obvious stuff)
>>>>
>>>> I tried a few things:
>>>>
>>>> Since the MDS doesn't seem to know about that OST, I tried this:
>>>>
>>>> mkfs.lustre --ost --reformat --backfstype=zfs --fsname=l2 --mgsnode=10.1.17.1 at o2ib42 --failover=10.1.17.22 at o2ib42 --index=3 OST03/ost0
>>>>
>>>> when I try to mount that OST I get:
>>>> mount.lustre: mount OST03/ost03 at /lustre/l2/ost03 failed: Address already in use
>>>> The target service's index is already in use. (OST03/ost03)
>>>>
>>>> (also if I try to do an mkfs.lustre with a --writeconf)
>>>>
>>>>
>>>> When I do an mkfs.lustre with the --replace option (saying this OST is going to replace the one with index 3 it 'seems' to work)
>>>>
>>>> with the replace option I can mount the OST on the OSS and an lctl dl shows it's up, but it doesn't show on the MDS
>>>>
>>>> Ron
>>>>
>>>>
>>>> On 11/13/2014 08:54 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>>>>>
>>>>> On Nov 13, 2014, at 10:49 AM, Ron Croonenberg <ronc at lanl.gov>
>>>>>   wrote:
>>>>>
>>>>>> I am using Lustre 2.4.2 and have an OST that doesn't seem to be written to.
>>>>>>
>>>>>> When I check the MDS with 'lctl dl' I do not see that OST in the list.
>>>>>> However when I check the OSS that OST belongs to I can see it is mounted and up;
>>>>>>
>>>>>>   0 UP osd-zfs l2-OST0003-osd l2-OST0003-osd_UUID 5
>>>>>>   3 UP obdfilter l2-OST0003 l2-OST0003_UUID 5
>>>>>>   4 UP lwp l2-MDT0000-lwp-OST0003 l2-MDT0000-lwp-OST0003_UUID 5
>>>>>>
>>>>>>
>>>>>> Since it isn't written to (the MDS doesn't seem to know about it, I created a directory. The index of that OST is 3  so I did a "lfs setstripe -i 3 -c 1 /mnt/l2-lustre/test-37" to force stuff that is written in that directory to be written on OST03
>>>>>>
>>>>>> However when I issue that command I get:
>>>>>>
>>>>>> -bash-4.1# lfs setstripe -i 3 -c 1 /mnt/l2-lustre/test-37
>>>>>> error on ioctl 0x4008669a for '/mnt/l2-lustre/test-37' (3): Invalid argument
>>>>>> error: setstripe: create stripe file '/mnt/l2-lustre/test-37' failed
>>>>>
>>>>>
>>>>>
>>>>> Does that OSS server only have one OST?  If so, could there be a communication problem between the MDS server and the OSS server?  Is there anything in the log files that indicates the OSS server is trying to connect to the MDS server but fails for some reason?
>>>>>
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>



More information about the lustre-discuss mailing list