Home > Net >  Filterting a json file by removing objects that contain certain keys
Filterting a json file by removing objects that contain certain keys

Time:10-30

I'm having a bit of trouble filtering my json file. Basically I have a json file where each line is a different json object (I know this is not the normal valid format but it's what I have to work with), and I want to go through each line and check if it contains either 1 of 2 keys (e.g. "name" or "firstname"). If either of the 2 keys exist in the json object, I want to keep it. And if not, I want to remove it. So at the end, I will have an output json file that doesn't include the objects missing those keys.

I've tried out a bunch of different things but I can't seem to get it to work, this is what I have so far:

jsonList = []

with open(filename) as f:
        for json_line in f:
            obj = json.loads(json_line)
            checker(obj)

def checker(obj):
    check = 0
    if ("name" in obj):
        check = 1
    if ("firstname" in obj):
        check = 1
    if (check == 1):
        jsonList.append(obj)

When I try printing jsonList after it just gives me an empty list [], so my check variable never changed to 1 even though there are json objects in my file that have those keys.

My json file looks something like this: (note: number of things inside each object isn't guaranteed so I can't just check for that)

{"name": "name1", "date": "2018-11-13", "age": 32}
{"firstname": "name2", "date": "2019-05-09", "age": 40}
{"date": "2019-11-04", "age": 35}

Does anyone have any ideas on what I could do? Or if you know why what I tried here didn't work?

CodePudding user response:

You are calling check(obj), which is not a method. Please call checker(obj)

CodePudding user response:

Your original code seems to work for me. I use the checker function as-is without modification:

import json
from io import StringIO
from pprint import pprint

jsonList = []

filedata = StringIO("""\
{"name": "name1", "date": "2018-11-13", "age": 32}
{"firstname": "name2", "date": "2019-05-09", "age": 40}
{"date": "2019-11-04", "age": 35}\
""")


def checker(obj):
    check = 0
    if ("name" in obj):
        check = 1
    if ("firstname" in obj):
        check = 1
    if (check == 1):
        jsonList.append(obj)

for json_line in filedata:
    obj = json.loads(json_line)
    checker(obj)

pprint(jsonList)

Output:

[{'age': 32, 'date': '2018-11-13', 'name': 'name1'},
 {'age': 40, 'date': '2019-05-09', 'firstname': 'name2'}]

Steps to Optimize

There's a couple different approaches to optimize your code, but the easiest way I'd suggest is with set.intersection to compare a set of required keys against the keys in a dict object. If there are any matches, then we add the dict object as it's valid.

jsonList = []
need_one_of_keys = {'name', 'firstname'}

for json_line in filedata:
    obj = json.loads(json_line)
    if need_one_of_keys.intersection(obj):
        jsonList.append(obj)

pprint(jsonList)

One other approach that's worth mentioning, is to use dict in combined with the builtin any function:

jsonList = []
need_one_of_keys = frozenset(['name', 'firstname'])

for json_line in filedata:
    obj = json.loads(json_line)
    if any(key in obj for key in need_one_of_keys):
        jsonList.append(obj)

pprint(jsonList)
  • Related