I'm trying to delete entire objects in a JSON file on the condition that they do not include ALL keys: "transaction_date", "asset_description", "asset_type", "type" and "amount" keys.
Below is my JSON file (it's been cut for this example):
{
"first_name": {
"0": "Thomas",
"1": "John",
},
"transactions": {
"0": [
{
"transaction_date": "11/29/2022",
"asset_description": "FireEye, Inc.",
"asset_type": "Stock",
"type": "Sale (Partial)",
"amount": "$1,001 - $15,000"
}
],
"1": [
{
"scanned_pdf": true,
"ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C,
"date_recieved": "01/30/2013"
}
],
}
}
I need to delete the entire "1" data from transactions and first_name. There are more then these two in the original file so the code needs to be universal to any amount rather than using [0], [1] etc. My code below tries to find items in "transactions" that do not include "scanned_pdf", "ptr_link" and "date_recieved" and then saves the JSON just with that updated data (my method is kind of inversed, so instead of deleting objects if it doesn't include x, it will pick up the objects that don't include y and update the JSON):
import json
with open("xxxtester.json", "r") as f_in:
data = json.load(f_in)
to_delete = {"scanned_pdf", "ptr_link", "date_recieved"}
for k in data["transactions"]:
data["transactions"][k] = [
{kk: vv for kk, vv in d.items() if kk not in to_delete}
for d in data["transactions"][k]]
open("xxxtester.json", "w").write(
json.dumps(data, indent=4))
However, my output still shows the "1" but with empty data "{}" etc. Should I use a different method of logic towards this? Or is it possible to add code to the existing script to make it work.
below is my desired output:
{
"first_name": {
"0": "Thomas",
},
"transactions": {
"0": [
{
"transaction_date": "11/29/2022",
"asset_description": "FireEye, Inc.",
"asset_type": "Stock",
"type": "Sale (Partial)",
"amount": "$1,001 - $15,000"
}
],
}
}
CodePudding user response:
If we reverse your logic (so we're selecting items we want to keep, rather than the other way around) and add a second comprehension to filter out empty values, we end up with this:
import json
with open("xxxtester.json", "r") as f_in:
data = json.load(f_in)
required = set(
("transaction_date", "asset_description", "asset_type", "type", "amount")
)
data["transactions"] = {
k: [transaction for transaction in v if all(k in transaction for k in required)]
for k, v in data['transactions'].items()
}
data["transactions"] = {
k: v for k, v in data['transactions'].items() if v
}
print(json.dumps(data, indent=4))
Given input like this:
{
"first_name": {
"0": "Thomas",
"1": "John"
},
"transactions": {
"0": [
{
"transaction_date": "11/29/2022",
"asset_description": "FireEye, Inc.",
"asset_type": "Stock",
"type": "Sale (Partial)",
"amount": "$1,001 - $15,000"
},
{
"scanned_pdf": true,
"ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C",
"date_recieved": "01/30/2013"
}
],
"1": [
{
"scanned_pdf": true,
"ptr_link": "https://efdsearch.senate.gov/search/view/paper/658E53E8-7C2C",
"date_recieved": "01/30/2013"
}
]
}
}
The above code produces:
{
"first_name": {
"0": "Thomas",
"1": "John"
},
"transactions": {
"0": [
{
"transaction_date": "11/29/2022",
"asset_description": "FireEye, Inc.",
"asset_type": "Stock",
"type": "Sale (Partial)",
"amount": "$1,001 - $15,000"
}
]
}
}
The first dictionary comprehension...
data["transactions"] = {
k: [transaction for transaction in v if all(k in transaction for k in required)]
for k, v in data['transactions'].items()
}
...produces:
...
"transactions": {
"0": [
{
"transaction_date": "11/29/2022",
"asset_description": "FireEye, Inc.",
"asset_type": "Stock",
"type": "Sale (Partial)",
"amount": "$1,001 - $15,000"
}
],
"1": []
}
...
The second comprehension filters out keys that have empty lists as values.
CodePudding user response:
With this code you are going to delete the whole thing.
import json
with open("xxxtester.json", "r") as f_in:
data = json.load(f_in)
with open("xxxtester.json", "w") as f:
del data["transactions"]["1"]
json.dump(data, f)