Compare values in a Json File using Python-CodePudding

I want a python script that take a json file (file.json), and compare the values of the keys

"From",

"To",

"Source",

"Destination",

"Service"

If all the values are the same then it will display the value of their "ID".

Example:

[
{

    "ID": "1",

    "Name": "D",

    "From": "DDD",

    "To": "A",

    "Source": "rex",

    "Destination": "proxy gr amz calc",

    "Schedule": "always",

    "Service": "SSH",

    "Action": "ACCEPT"

  },

  {

    "ID": "4",

    "Name": "A",

    "From": "AAA",

    "To": "BBB",

    "Source": "amazon",

    "Destination": "Reseau LAB Reseau noeud Reseau Optique Nokia",

    "Schedule": "always",

    "Service": "FTP HTTP HTTPS PING SNMP SSH TELNET ALL_ICMP NOKIA-TCP-9213",

    "Action": "ACCEPT"

  },

  {

    "ID": "6",

    "Name": "B",

   "From": "DDD",

    "To": "A",

    "Source": "rex",

    "Destination": "proxy gr amz calc",

    "Schedule": "no",

    "Service": "SSH",

    "Action": "ACCEPT"

  }

]

For this situation, the script will show "ID": 1 and 6

Also put them in a csv file showing the values of the "ID" and alle the rest of the keys and values.

 import pandas as pd
 from pprint import pprint as prt

 with open('file.json') as f:
    data = pd.read_json(f)
    ids = data["From"]
    datas = data[ids.isin(ids[ids.duplicated()])].sort_values("ID")
    prt(datas)

CodePudding user response：

You can use pandas groupby which groups rows according to given columns and then get list of the IDs of grouped rows:

with open('file.json') as f:
    data = pd.read_json(f)
    output = list(
        data.groupby(["From", "To", "Source", "Destination", "Service"])["ID"].agg(list))

Output:

[[4], [1, 6]]

You can further filter the list to items of at least two match:

output = [ids for ids in output if len(ids)>1]

Output:

[[1, 6]]

CodePudding user response：

Here's an example of how you can accomplish this using Python:

import json
import pandas as pd

# Load the JSON file
with open('file.json', 'r') as f:
    data = json.load(f)

# Create an empty list to store the IDs of matching records
matching_ids = []

# Iterate through each record in the JSON data
for record in data:
    match = True
    # Compare the values of the specified keys
    for key in ["From", "To", "Source", "Destination", "Service"]:
        if record[key] != data[0][key]:
            match = False
            break
    # If all the values match, add the ID to the list
    if match:
        matching_ids.append(record["ID"])

# Print the IDs of the matching records
print(matching_ids)

# Create a new DataFrame from the JSON data
df = pd.DataFrame(data)

# Filter the DataFrame to only include the matching records
df = df[df['ID'].isin(matching_ids)]

# Write the DataFrame to a CSV file
df.to_csv('output.csv', index=False)

This script will load the JSON file, iterate through each record and compare the values of the specified keys. If all the values match, the ID of the record will be added to a list. Then it will print the list of matching IDs, create a new DataFrame from the JSON data, filter the DataFrame to only include the matching records and then write the DataFrame to a CSV file.