Home > Software engineering >  Remove JSON list string items based on a list of strings
Remove JSON list string items based on a list of strings

Time:01-28

Following is my sample json file:

    {
        "test": [{
            "Location": "Singapore",
            "Values": [{
                    "Name": "00",
                    "subvalues": [
                        "5782115e1",
                        "688ddd16e",
                        "3e91dc966",
                        "5add96256",
                        "10352cf0f",
                        "17f233d31",
                        "130c09135",
                        "2f49eb2a6",
                        "2ae1ad9e0",
                        "23fd76115"
                    ]
                },
                {
                    "Name": "01",
                    "subvalues": [
                        "b43678dfe",
                        "202c7f508",
                        "73afcaf7c"
                    ]
                }
            ]
        }]
    }

I'm trying to remove from json file using the following list: ["130c09135", "2f49eb2a6", "5782115e1", "b43678dfe"]

end result:

 {
    "test": [{
        "Location": "Singapore",
        "Values": [{
                "Name": "00",
                "subvalues": [
                    "688ddd16e",
                    "3e91dc966",
                    "5add96256",
                    "10352cf0f",
                    "17f233d31",
                    "2ae1ad9e0",
                    "23fd76115"
                ]
            },
            {
                "Name": "01",
                "subvalues": [
                    "202c7f508",
                    "73afcaf7c"
                ]
            }
        ]
    }]
 }

I know that using replace in text it would break the structure, new to json, any help would be appreciated.

CodePudding user response:

You can use following code snippet:

import json

toRemoveList = ["130c09135", "2f49eb2a6", "5782115e1", "b43678dfe"]

with open('data.json', 'r') as file:
    jsonData = json.loads(file.read())

for valueIndex in range(0, len(jsonData["test"][0]["Values"])):
    value = jsonData["test"][0]["Values"][valueIndex]

    filtered = [x for x in value["Subvalues"] if x not in toRemoveList]

    jsonData["test"][0]["Values"][valueIndex]["Subvalues"] = filtered

with open('newData.json', 'w') as file:
    json.dump(jsonData, file, indent=4)

Note: You must use 'Subvalues' with same writing in every instance. You can't use 'Subvalues' and 'subvalues' in different instances...

CodePudding user response:

Here is a generalised approach that does not rely on names of keys or depth. The only assumption is that if the dictionary contains any list comprised entirely of strings, it will be reconstructed excluding certain values - i.e., the EXCLUSIONS set

from json import load as LOAD, dumps as DUMPS

FILENAME = '/Volumes/G-Drive/foo.json'
EXCLUSIONS = {"130c09135", "2f49eb2a6", "5782115e1", "b43678dfe"}

def process(d):
    if isinstance(d, dict):
        for v in d.values():
            process(v)
    elif isinstance(d, list):
        if all(isinstance(v, str) for v in d):
            d[:] = [v for v in d if v not in EXCLUSIONS]
        else:
            for v in d:
                process(v)
    return d


with open(FILENAME) as data:
    print(DUMPS(process(LOAD(data)), indent=2))

Output:

{
  "test": [
    {
      "Location": "Singapore",
      "Values": [
        {
          "Name": "00",
          "Subvalues": [
            "688ddd16e",
            "3e91dc966",
            "5add96256",
            "10352cf0f",
            "17f233d31",
            "2ae1ad9e0",
            "23fd76115"
          ]
        },
        {
          "Name": "01",
          "subvalues": [
            "202c7f508",
            "73afcaf7c"
          ]
        }
      ]
    }
  ]
}
  • Related