I'm looking for a solution to merge multiples JSONL files from one folder using a Python script.Somthing like the script below that work for a JSON files.
import json
import glob
result = []
for f in glob.glob("*.json"):
with jsonlines.open(f) as infile:
result.append(json.load(infile))
with open("merged_file.json", "wb") as outfile:
json.dump(result, outfile)
Do anyone know how can I handle loading this ? Thank you.
Best,
CodePudding user response:
You can update a main dict with every json object you load. Like
import json
import glob
result = {}
for f in glob.glob("*.json"):
with jsonlines.open(f) as infile:
result.update(json.load(infile)) #merge the dicts
with open("merged_file.json", "wb") as outfile:
json.dump(result, outfile)
But this will overwite similar keys.!
CodePudding user response:
Since each line in a JSONL file is a complete JSON object, you don't actually need to parse the JSONL files at all in order to merge them into another JSONL file. Instead, merge them by simply concatenating them:
with open("merged_file.json", "w") as outfile:
for filename in glob.glob("*.json"):
with open(filename) as infile:
outfile.write(infile.read())