I have some highly nested JSON files I need to work with.
A short example:
{
"coffee":[
{
"value":"coffee"
},
{
"value":"water"
}
],
"cake":{
"value":{
"dough":[
{
"value":"2",
"name":"eggs"
},
{
"value":"500g",
"name":"flour"
},
{
"value":{
"almondpaste":[
{
"value":"300g",
"name":"almonds"
},
{
"value":"200g",
"name":"oil"
},
{
"value":"200g",
"name":"sugar"
},
]
},
{
"value":"200g",
"name":"sugar"
},
.
.
.
.
.
.
I would now like to read all names from the JSON file and write them into a list. This is not particularly difficult if the JSON file has a fixed structure. However, my JSON files have a variable structure and variable depth. Sometimes everything happens on one level, but there are also files that go up to level 4 or 5. I would now like to create a variable solution that iterates over all layers of the JSON and searches for certain keys.
I have already tried something in the following direction, but I always get error messages.
list = []
for k for val in json_file for d in val for j in d.keys():
if k== "name":
list.append(k['name'])
if d=="name":
list.append(k['name'])
if j=="name":
list.append(k['name'])
print(list)
Error:
for k for val in json_file for d in val for j in d.keys():
^
SyntaxError: invalid syntax
Maybe someone has a code sample that could solve my problem and from which I could develop an idea for myself?
CodePudding user response:
You can define this function:
def iterate(data):
if isinstance(data, list):
for item in data:
yield from iterate(item)
elif isinstance(data, dict):
for key, item in data.items():
if key == 'name':
yield item
else:
yield from iterate(item)
And then you can use it like this (data
is your json data):
result = list(iterate(data))
Let's do an example. This is your input data
:
>>> data
{'coffee': [{'value': 'coffee'}, {'value': 'water'}], 'cake': {'value': {'dough': [{'value': '2', 'name': 'eggs'}, {'value': '500g', 'name': 'flour'}, {'value': {'almondpaste': [{'value': '300g', 'name': 'almonds'}, {'value': '200g', 'name': 'oil'}, {'value': '200g', 'name': 'sugar'}]}}, {'value': '200g', 'name': 'sugar'}]}}}
Here is the output:
>>> list(iterate(data))
['eggs', 'flour', 'almonds', 'oil', 'sugar', 'sugar']