[lustre-discuss] Joining files

Patrick Farrell pfarrell at ddn.com
Wed Mar 29 06:41:29 PDT 2023


Sven,

The "combining layouts without any data movement" part isn't currently possible.  It's probably possible in theory, but it's never been implemented.  (I'm curious what your use case is?)

Even allowing for data movement, there's no tool to do this for you.  Depending what you mean by combining, it's possible to do this with Linux tools (see the end of my note), but you're going to have data copying.

It's a bit of an odd requirement, with some inherent questions - For example, file layouts generally go to infinity, because if they don't, you will get IO errors when you 'run off the end', ie, go past the defined layout, so the last component is usually defined to go to infinity.

That poses obvious questions when combining files.

If you're looking to combine files with layouts that do not go to infinity, then it's at least straightforward to see how you'd concatenate them.  But presumably the data in each file doesn't go to the very end of the layout?  So do you want the empty parts of the layout included?

Say file 1 is 10 MiB in size but the layout goes to 20 MiB (again, layouts normally should go to infinity) and file 2 is also 10 MiB in size but the layout goes to, say, 15 MiB.  Should the result look like this?

Layout: 1 1 1 1 1 1 1 ... 20 MiB 2 2 2 2 2 2 .... 35 MiB

With data from 0-10 MiB and 20 - 30 MiB.

That's something you'd have to write a tool for, so it could write the data at your specified offset for putting in the second file (and third, etc...).  You could also do something like:

lfs setstripe [your layout] combined file; cat file 1 > combined file; truncate [combined file] 20 MiB (the end of the file 1 layout); cat file 2 > combined_file", etc.

So, you definitely can't avoid data copying here.  But that's how you could do it with simple Linux tools (which you could probably have drawn up yourself :)).

-Patrick

________________________________
From: lustre-discuss <lustre-discuss-bounces at lists.lustre.org> on behalf of Sven Willner <sven.willner at mpimet.mpg.de>
Sent: Wednesday, March 29, 2023 7:58 AM
To: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org>
Subject: [lustre-discuss] Joining files

[You don't often get email from sven.willner at mpimet.mpg.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Dear all,

I am looking for a way to join/merge/concatenate several files into one, whose layout is just the concatenation of the layouts of the respective files - ideally without any copying/moving on the data side (even if this would result in "holes" in the joined file).

I would very much appreciate any hints to tools or ideas of how to achieve such a join. As I understand that has been a `join` command for `lfs`, which is now deprecated (however, I am not sure if a use case like mine has been its purpose or why it has been deprecated).

Thanks a lot!
Best regards,
Sven

--
Dr. Sven Willner
Scientific Computing Lab (SCLab)
Max Planck Institute for Meteorology
Bundesstraße 53, D-20146 Hamburg, Germany
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20230329/f2731a88/attachment-0001.htm>


More information about the lustre-discuss mailing list