[lustre-discuss] Lustre/ZFS space accounting

Fri Jun 9 00:27:27 PDT 2017

Hi,

We have ruled that out by monitoring use. It is happening during
checkpointing. So a continuing process were old checkpoints get deleted
after new ones are made. There are many checkpoints before

I messed things up in my first mail, so it wasn't clear why I talked
about space. Sometimes they they just get this (first number is MPI rank):

222: forrtl: Input/output error
222: forrtl: severe (28): CLOSE error, unit 10, file "Unknown"

Sometimes they get:

33: forrtl: No space left on device
14: forrtl: No space left on device
08: forrtl: Input/output error
08: forrtl: severe (28): CLOSE error, unit 10, file "Unknown"

Info: We have a ZFS snapshot of the osts and mdt. It's ZFS 0.6.5.7.

Cheers,
Hans Henrik

On 09-06-2017 08:41, Thomas Roth wrote:
> Hi,
> 
> I don't know about the error messages. But are you sure that the
> imbalance of the OST filling isn't due to some extremely large files
> written overnight or so (- with default striping, one file -> one OST).
> Our users are able to do that, without realizing.
> 
> Regards,
> Thomas
> 
> On 08.06.2017 10:11, Hans Henrik Happe wrote:
>> Hi,
>>
>> We are on Lustre 2.8 with ZFS.
>>
>> Our users have seen some unexplainable errors:
>>
>> 062: forrtl: Input/output error
>>
>> Or
>>
>> 062: forrtl: severe (28): CLOSE error, unit 10, file “Unknown"
>>
>>
>>  From attached 'lfs df -h' you can see that the OSTs are unbalanced and
>> OST0001 but far from being full. We are using default allocation setting
>> so we should be in weighted mode.
>>
>> I've tried to find an LU matching this but no luck. Also, log on
>> affected nodes and on servers are empty.
>>
>> Any suggestions about how to debug this?
>>
>> Cheers,
>> Hans Henrik
>>
>>
>>
>> _______________________________________________
>> lustre-discuss mailing list
>> lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>