[lustre-discuss] Lustre/ZFS space accounting
Dilger, Andreas
andreas.dilger at intel.com
Fri Jun 9 09:06:28 PDT 2017
The error 28 on close may also be out of space (28 == ENOSPC).
How many clients on your system?
I would recommend to use find/lfs find to locate some of the larger files on OST0002 and lfs_migrate them to other OSTs.
Cheers, Andreas
> On Jun 9, 2017, at 01:27, Hans Henrik Happe <happe at nbi.ku.dk> wrote:
>
> Hi,
>
> We have ruled that out by monitoring use. It is happening during
> checkpointing. So a continuing process were old checkpoints get deleted
> after new ones are made. There are many checkpoints before
>
> I messed things up in my first mail, so it wasn't clear why I talked
> about space. Sometimes they they just get this (first number is MPI rank):
>
> 222: forrtl: Input/output error
> 222: forrtl: severe (28): CLOSE error, unit 10, file "Unknown"
>
> Sometimes they get:
>
> 33: forrtl: No space left on device
> 14: forrtl: No space left on device
> 08: forrtl: Input/output error
> 08: forrtl: severe (28): CLOSE error, unit 10, file "Unknown"
>
> Info: We have a ZFS snapshot of the osts and mdt. It's ZFS 0.6.5.7.
>
> Cheers,
> Hans Henrik
>
>> On 09-06-2017 08:41, Thomas Roth wrote:
>> Hi,
>>
>> I don't know about the error messages. But are you sure that the
>> imbalance of the OST filling isn't due to some extremely large files
>> written overnight or so (- with default striping, one file -> one OST).
>> Our users are able to do that, without realizing.
>>
>> Regards,
>> Thomas
>>
>>> On 08.06.2017 10:11, Hans Henrik Happe wrote:
>>> Hi,
>>>
>>> We are on Lustre 2.8 with ZFS.
>>>
>>> Our users have seen some unexplainable errors:
>>>
>>> 062: forrtl: Input/output error
>>>
>>> Or
>>>
>>> 062: forrtl: severe (28): CLOSE error, unit 10, file “Unknown"
>>>
>>>
>>> From attached 'lfs df -h' you can see that the OSTs are unbalanced and
>>> OST0001 but far from being full. We are using default allocation setting
>>> so we should be in weighted mode.
>>>
>>> I've tried to find an LU matching this but no luck. Also, log on
>>> affected nodes and on servers are empty.
>>>
>>> Any suggestions about how to debug this?
>>>
>>> Cheers,
>>> Hans Henrik
>>>
>>>
>>>
>>> _______________________________________________
>>> lustre-discuss mailing list
>>> lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>>
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
More information about the lustre-discuss
mailing list