Home > Back-end >  How to merge 2 fields and find unique records from a JSON file using Python
How to merge 2 fields and find unique records from a JSON file using Python

Time:05-05

I have a JSON file which contains duplicate records and I need to send the unique records to another application. In order to get the unique records, first I need to merge JRNAL_NO & JRNAL_LINE fields separated a hyphen (ex: 655-1) and then use this key to identify the unique records. Is this possible to do via a python script?

Source

Target

Thank you, John

CodePudding user response:

I managed to get the expected results using the below script (thanks to some old posts in SO). I am new to Python, so if there are better ways of doing this, kindly suggest. Thanks.

import json
source = json.loads(input_var)
target = []
seen = set()

for record in source:
    name = record['JRNAL_NO']   record['JRNAL_LINE']
    if name not in seen:
        seen.add(name)
        target.append(record)
del seen

output_var = json.dumps(target)

CodePudding user response:

In order to solve this problem, we can gather the dictionaries in a list and then apply a list-comprehension to eliminate duplicates:

dictA = {"PERIOD": "2022007", "JRNAL_NO": "655", "JRNAL_LINE": "1", "D_C": "C"}
dictB = {"PERIOD": "2022007", "JRNAL_NO": "655", "JRNAL_LINE": "3", "D_C": "C"}
dictC = {"PERIOD": "2022007", "JRNAL_NO": "655", "JRNAL_LINE": "3", "D_C": "C"}
dictD = {"PERIOD": "2022007", "JRNAL_NO": "655", "JRNAL_LINE": "3", "D_C": "C"}

list_of_dicts = [dictA, dictB, dictC, dictD]

result = []
[result.append(x) for x in list_of_dicts if x not in result]

print(result)

This will return a list of the non-reapeted dictionaries:

[{'PERIOD': '2022007', 'JRNAL_NO': '655', 'JRNAL_LINE': '1', 'D_C': 'C'}, {'PERIOD': '2022007', 'JRNAL_NO': '655', 'JRNAL_LINE': '3', 'D_C': 'C'}]

Extra relevant links:

List Comprehension

  • Related