Home > Blockchain >  jq compare 2 json files and delete duplicate objects by key value
jq compare 2 json files and delete duplicate objects by key value

Time:10-12

I have 2 json files that I wish to compare with each other and end up with a file that has just the unique objects from the second file. A unique object is not determined by the full object but rather by the value of one key within it called Car ID. Below are a sample of the json files to explain this better:

file1.json:

{
"User Data": [
{"First Name": "Tony", "Last Name": "Evans", "DOB": "1987-02-01", "Car ID": "UJ928JD9"},
{"First Name": "John", "Last Name": "Smith", "DOB": "1972-11-27", "Car ID": "UJ235UW8"},
{"First Name": "Kirsty", "Last Name": "Morgan", "DOB": "1991-06-08", "Car ID": "UJ424KL2"},
...
]
}

file2.json

{
"User Data": [
{"First Name": "Harry", "Last Name": "Jones", "DOB": "1983-03-09", "Car ID": "UJ928JD9"},
{"First Name": "Jeremy", "Last Name": "Blake", "DOB": "1965-09-21", "Car ID": "UJ345IE2"},
{"First Name": "Jason", "Last Name": "Roberts", "DOB": "1972-10-18", "Car ID": "UJ424KL2"},
...
]
}

In the sample above, the Car ID is the same for 2 lines so only the second object from the second file should remain, as so:

{
    "User Data": [
    {"First Name": "Jeremy", "Last Name": "Blake", "DOB": "1965-09-21", "Car ID": "UJ345IE2"}
    ]
    }

CodePudding user response:

Read in both files as variables using --argfile, then map both to just their Car ID, then subtract the first array from the second resulting in an array of Car IDs you want to keep, and finally use the keys to select the right elements from the second input file.

jq -n \
  --argfile file1 file1.json \
  --argfile file2 file2.json \
  '
    # Take both input files and create an array $ids with all
    # Car IDs from the second file that are not part of the first one
    [ $file1, $file2 ]
    | map(.["User Data"] | map(.["Car ID"]))
    | (.[1] - .[0]) as $ids
    
    # Take the second input file and select only those entries
    # whose Car ID is part of the previously stored array of Car IDs
    | $file2
    | .["User Data"] |= map(select(.["Car ID"] == $ids[]))
  '
  • Related