Home > database >  Merge/update json objects from two different json files
Merge/update json objects from two different json files

Time:10-19

I have two JSON files, both have the same number of objects and each object has an ID key "DOCN", the problem is some objects have different keys, for example in file1 the object "DOCN": "000093019" has 4 keys, while in file2 the same object has 5 ..

I'm trying to create a new file that contains the same objects in both files (find the missing ones in file1 and file2 and add them to the object)

Example:

file1:

[
    {
        "DOCN": "000093019",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla"
    },
    {
        "DOCN": "000093085",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla"
    }
]

file2:

[
    {
        "DOCN": "000093019",
        "A": "blabla",
        "C": "blabla",
        "D": "blabla",
        "E": "blabla"
    },
    {
        "DOCN": "000093085",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla"
    }
]

what I would like to achieve: file3:

[
    {
        "DOCN": "000093019",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla",
        "E": "blabla"
    },
    {
        "DOCN": "000093085",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla"
    }
]

CodePudding user response:

I would do it the following way - load 2 files using pandas, concat dataframes, group by DOCN and take the first record (this will take the none None values) and then turn it into a list and remove None entries -

df1 = pd.read_json("my_file1.json")
df2 = pd.read_json("my_file2.json")
df = pd.concat([df1, df2])
grp = df.groupby("DOCN").first().reset_index()
[{k: v for k, v in record if v} for record in grp.to_dict(orient='records')]

CodePudding user response:

I would read both of them in two different arrays and map it to create new one.

// read file1 instead using `fs`
const arr1 = [
    {
        "DOCN": "000093019",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla"
    },
    {
        "DOCN": "000093085",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla"
    }
]
// read file2 instead
const arr2 = [
    {
        "DOCN": "000093019",
        "A": "blabla",
        "C": "blabla",
        "D": "blabla",
        "E": "blabla"
    },
    {
        "DOCN": "000093085",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla"
    }
]

const arr3 = arr1.map(
    x => {
      const val = arr2.find(y => y.DOCN === x.DOCN)
      x= {
        ...x,
        ...val
      }
      return x
    })

//write arr3 to new file
    ```

CodePudding user response:

Well, this is a simple manipulation on the dictionaries. I cannot say this will be best performing for a large dataset. But you can merge the dicts based on the key "DOCN". (May be there are better appraoches! ;-))

f1 = [
    {
        "DOCN": "000093019",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla"
    },
    {
        "DOCN": "000093085",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla"
    }
]

f2 = [
    {
        "DOCN": "000093019",
        "A": "blabla",
        "C": "blabla",
        "D": "blabla",
        "E": "blabla"
    },
    {
        "DOCN": "000093085",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla"
    }
]

f1 = {item.get("DOCN"): item for item in f1}
f2 = {item.get("DOCN"): item for item in f2}

keys = set(list(f1.keys()) list(f2.keys()))

output = []
for key in keys:
    output.append({**f1.get(key), **f2.get(key)})

print(output)

The output was:

[
    {
        "DOCN": "000093019",
        "A": "blabla",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla",
        "E": "blabla"
    },
    {
        "DOCN": "000093085",
        "B": "blabla",
        "C": "blabla",
        "D": "blabla",
        "A": "blabla"
    }
]
  • Related