Home > Back-end >  How to merge N sorted files in to one sorted file without storing in memory?
How to merge N sorted files in to one sorted file without storing in memory?

Time:06-02

I have N files that is delineated by a new-line break. Each file has it's rows/lines sorted in lexicographic order (the rows themselves don't have to be sorted). For example:

Include any error messages \n
Include details about your goal \n
Describe expected and actual results \n

How do I merge all multiple files so that the output file is sorted without loading all files in memory?

While this is not an algorithm problem per-se, it does remind me of the leetcode problem of Merging K Sorted Linked Lists. In this case, a node would be one line in a file.

CodePudding user response:

Try heapq.merge:

If you have two files:

file1.txt:

aaa
aab
bbb
ooo

file2.txt:

ccc
ddd
zzz

Then:

from heapq import merge

files = ["file1.txt", "file2.txt"]

for m in merge(*map(open, files)):
    print(m.strip())

Prints:

aaa
aab
bbb
ccc
ddd
ooo
zzz
  • Related