I have a JSON like this:
{
"department":"Data & Analytics",
"child":[
{
"department":"Data Enginnering",
"child": [
{"department":"AWS Squad"},
{"department":"GCP Squad"}
..
..so..
..on..
..so..
..forth..
..
]
},
{
"department":"Data Science"
}
]
}
I need to load it in BigQuery so what I am looking for is to transform it in something like the code below before:
[
{
"department":"Data & Analytics",
"child":["Data Enginnering", "Data Science"]
},
{
"department":"Data Enginnering",
"child":["AWS Squad", "GCP Squad"]
},
{
"department":"Data Science"
},
{
"department": "AWS Squad"
},
{
"department": "GCP Squad"
}
]
But i got stuck trying
CodePudding user response:
For a non-recursive approach, you can use the standard breadth-first traversal of using a queue and pushing the children into it.
from collections import deque
def flatten(data):
q = deque([data])
while q:
current = q.popleft()
d = {"department": current['department']}
for child in current.get('child', []):
d.setdefault('child', []).append(child['department'])
q.append(child)
yield d
list(flatten(data))
Which will give you:
[{'department': 'Data & Analytics',
'child': ['Data Enginnering', 'Data Science']},
{'department': 'Data Enginnering', 'child': ['AWS Squad', 'GCP Squad']},
{'department': 'Data Science'},
{'department': 'AWS Squad'},
{'department': 'GCP Squad'}]
It's a subtle change in order from the recursive approach which will be depth first.
CodePudding user response:
Since the data is recursive, this can be solved using recursion.
def convert(data, output):
department = data["department"]
children = data.get("child")
new_object = {"department": department}
output.append(new_object)
if children:
new_object["child"] = [convert(child, output) for child in children]
return department
It would be used like this
test_data = {
"department":"Data & Analytics",
"child":[
{
"department":"Data Enginnering",
"child": [
{"department":"Other"},
{"department":"Sales"}
]
},
{
"department":"Data Science"
}
]
}
output = []
convert(test_data, output)
# convert output to json and send to BigQuery...
For the above example, the result is
[
{
"department": "Data & Analytics",
"child": [
"Data Enginnering",
"Data Science"
]
},
{
"department": "Data Enginnering",
"child": [
"Other",
"Sales"
]
},
{
"department": "Other"
},
{
"department": "Sales"
},
{
"department": "Data Science"
}
]
It is not quite the same as your example output, but it's unclear from that example why some departments get an object added to the main list, and others don't.