I'm having a bit of trouble filtering my json file. Basically I have a json file where each line is a different json object (I know this is not the normal valid format but it's what I have to work with), and I want to go through each line and check if it contains either 1 of 2 keys (e.g. "name" or "firstname"). If either of the 2 keys exist in the json object, I want to keep it. And if not, I want to remove it. So at the end, I will have an output json file that doesn't include the objects missing those keys.
I've tried out a bunch of different things but I can't seem to get it to work, this is what I have so far:
jsonList = []
with open(filename) as f:
for json_line in f:
obj = json.loads(json_line)
checker(obj)
def checker(obj):
check = 0
if ("name" in obj):
check = 1
if ("firstname" in obj):
check = 1
if (check == 1):
jsonList.append(obj)
When I try printing jsonList after it just gives me an empty list []
, so my check
variable never changed to 1 even though there are json objects in my file that have those keys.
My json file looks something like this: (note: number of things inside each object isn't guaranteed so I can't just check for that)
{"name": "name1", "date": "2018-11-13", "age": 32}
{"firstname": "name2", "date": "2019-05-09", "age": 40}
{"date": "2019-11-04", "age": 35}
Does anyone have any ideas on what I could do? Or if you know why what I tried here didn't work?
CodePudding user response:
You are calling check(obj), which is not a method. Please call checker(obj)
CodePudding user response:
Your original code seems to work for me. I use the checker
function as-is without modification:
import json
from io import StringIO
from pprint import pprint
jsonList = []
filedata = StringIO("""\
{"name": "name1", "date": "2018-11-13", "age": 32}
{"firstname": "name2", "date": "2019-05-09", "age": 40}
{"date": "2019-11-04", "age": 35}\
""")
def checker(obj):
check = 0
if ("name" in obj):
check = 1
if ("firstname" in obj):
check = 1
if (check == 1):
jsonList.append(obj)
for json_line in filedata:
obj = json.loads(json_line)
checker(obj)
pprint(jsonList)
Output:
[{'age': 32, 'date': '2018-11-13', 'name': 'name1'},
{'age': 40, 'date': '2019-05-09', 'firstname': 'name2'}]
Steps to Optimize
There's a couple different approaches to optimize your code, but the easiest way I'd suggest is with set.intersection
to compare a set of required keys against the keys in a dict
object. If there are any matches, then we add the dict
object as it's valid.
jsonList = []
need_one_of_keys = {'name', 'firstname'}
for json_line in filedata:
obj = json.loads(json_line)
if need_one_of_keys.intersection(obj):
jsonList.append(obj)
pprint(jsonList)
One other approach that's worth mentioning, is to use dict in
combined with the builtin any
function:
jsonList = []
need_one_of_keys = frozenset(['name', 'firstname'])
for json_line in filedata:
obj = json.loads(json_line)
if any(key in obj for key in need_one_of_keys):
jsonList.append(obj)
pprint(jsonList)