I have two JSON files, both have the same number of objects and each object has an ID key "DOCN"
, the problem is some objects have different keys, for example in file1 the object "DOCN": "000093019"
has 4 keys, while in file2 the same object has 5 ..
I'm trying to create a new file that contains the same objects in both files (find the missing ones in file1 and file2 and add them to the object)
Example:
file1:
[
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla"
},
{
"DOCN": "000093085",
"B": "blabla",
"C": "blabla",
"D": "blabla"
}
]
file2:
[
{
"DOCN": "000093019",
"A": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"A": "blabla",
"B": "blabla",
"C": "blabla"
}
]
what I would like to achieve: file3:
[
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"A": "blabla",
"B": "blabla",
"C": "blabla",
"D": "blabla"
}
]
CodePudding user response:
I would do it the following way - load 2 files using pandas, concat dataframes, group by DOCN
and take the first record (this will take the none None values) and then turn it into a list and remove None entries -
df1 = pd.read_json("my_file1.json")
df2 = pd.read_json("my_file2.json")
df = pd.concat([df1, df2])
grp = df.groupby("DOCN").first().reset_index()
[{k: v for k, v in record if v} for record in grp.to_dict(orient='records')]
CodePudding user response:
I would read both of them in two different arrays and map it to create new one.
// read file1 instead using `fs`
const arr1 = [
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla"
},
{
"DOCN": "000093085",
"B": "blabla",
"C": "blabla",
"D": "blabla"
}
]
// read file2 instead
const arr2 = [
{
"DOCN": "000093019",
"A": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"A": "blabla",
"B": "blabla",
"C": "blabla"
}
]
const arr3 = arr1.map(
x => {
const val = arr2.find(y => y.DOCN === x.DOCN)
x= {
...x,
...val
}
return x
})
//write arr3 to new file
```
CodePudding user response:
Well, this is a simple manipulation on the dictionaries. I cannot say this will be best performing for a large dataset. But you can merge the dicts based on the key "DOCN". (May be there are better appraoches! ;-))
f1 = [
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla"
},
{
"DOCN": "000093085",
"B": "blabla",
"C": "blabla",
"D": "blabla"
}
]
f2 = [
{
"DOCN": "000093019",
"A": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"A": "blabla",
"B": "blabla",
"C": "blabla"
}
]
f1 = {item.get("DOCN"): item for item in f1}
f2 = {item.get("DOCN"): item for item in f2}
keys = set(list(f1.keys()) list(f2.keys()))
output = []
for key in keys:
output.append({**f1.get(key), **f2.get(key)})
print(output)
The output was:
[
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"B": "blabla",
"C": "blabla",
"D": "blabla",
"A": "blabla"
}
]