I have a huge nested json file and I want to get the values of "text" but only on a certain level as there are many "text" keys deeper in the json file. The level I mean would be the "text:"Hi" after "event":"user".
The file looks like this:
`
{
"_id":{
"$oid":"123"
},
"events":[
{
"event":"action",
"metadata":{
"model_id":"12"
},
"action_text":null,
"hide_rule_turn":false
},
{
"event":"user",
"text":"Hi",
"parse_data":{
"intent":{
"name":"greet",
"confidence":{
"$numberDouble":"0.9601748585700989"
}
},
"entities":[
],
"text":"Hi",
"metadata":{
},
"text_tokens":[
[
{
"$numberInt":"0"
},
{
"$numberInt":"2"
}
]
],
"selector":{
"ideas":{
"response":{
"responses":[
{
"text":"yeah"
},
{
"text":"No"
},
{
"text":"Goo"
}
]
},
`
First I uses this function to get the text data but of course if gave me all of them:
def json_extract(obj, key):
"""Recursively fetch values from nested JSON."""
arr = []
def extract(obj, arr, key):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif k == key:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, key)
return arr
values = extract(obj, arr, key)
return values
I also tried to access only the second level through this text but it gave me a KeyNotFound Error:
for i in data["events"][0]:
print(i["text"])
Maybe because that key is not in every nested list? ... I really don't know what else I could do
CodePudding user response:
Since events
is a list, you can write a list comprehension (if there are multiple items you need), or you can use the next
function to get an element that you need from the iterator:
event = next(e for e in data.get('events', list()) if e.get('event')=='user')
print(event.get('text', ''))
Using get
method gives you the safety that it won't throw an exception if the key doesn't exist in the dictionary
Edit: If you need this for all events:
all_events = [e for e in data.get('events', list()) if e.get('event')=='user']
for event in all_events:
print(event.get('text', ''))
CodePudding user response:
Convert your JSON to a Python dictionary (e.g., json.load or json.loads depending on how you're accessing the JSON). Then just pass a reference to the dictionary to this:
def json_extract(jdata):
assert isinstance(jdata, dict)
arr = []
def _extract(d, arr):
if 'event' in d and (t := d.get('text')):
arr.append(t)
for k, v in d.items():
if k not in {'event', 'text'}:
if isinstance(v, list):
for e in v:
if isinstance(e, dict):
_extract(e, arr)
elif isinstance(v, dict):
_extract(v, arr)
return arr
return _extract(jdata, arr)
This will return a list of all values associated with the key 'text' providing that key is found in a dictionary that also has an 'event' key