Home > front end >  Unnest hierarchy json with python?
Unnest hierarchy json with python?

Time:06-30

I have a JSON like this:

{
    "department":"Data & Analytics",
    "child":[
        {
            "department":"Data Enginnering",
            "child": [
                {"department":"AWS Squad"},
                {"department":"GCP Squad"}
                ..
                    ..so..
                        ..on..
                            ..so..
                                ..forth..
                                    ..
            ]
        },
        {
            "department":"Data Science"
        }
    ]
}

I need to load it in BigQuery so what I am looking for is to transform it in something like the code below before:

[
    {
        "department":"Data & Analytics",
        "child":["Data Enginnering", "Data Science"]
    },
    {
        "department":"Data Enginnering",
        "child":["AWS Squad", "GCP Squad"]
    },
    {
        "department":"Data Science"
    },
    {
        "department": "AWS Squad"
    },
    {
        "department": "GCP Squad"
    }
]

But i got stuck trying

CodePudding user response:

For a non-recursive approach, you can use the standard breadth-first traversal of using a queue and pushing the children into it.

from collections import deque

def flatten(data):
    q = deque([data])

    while q:
        current = q.popleft()
        d = {"department": current['department']}
        
        for child in current.get('child', []):
            d.setdefault('child', []).append(child['department'])
            q.append(child)
        
        yield d

                
list(flatten(data))

Which will give you:

[{'department': 'Data & Analytics',
  'child': ['Data Enginnering', 'Data Science']},
 {'department': 'Data Enginnering', 'child': ['AWS Squad', 'GCP Squad']},
 {'department': 'Data Science'},
 {'department': 'AWS Squad'},
 {'department': 'GCP Squad'}]

It's a subtle change in order from the recursive approach which will be depth first.

CodePudding user response:

Since the data is recursive, this can be solved using recursion.

def convert(data, output):
    department = data["department"]
    children = data.get("child")

    new_object = {"department": department}
    output.append(new_object)

    if children:
        new_object["child"] = [convert(child, output) for child in children]
    
    return department

It would be used like this

test_data = {
    "department":"Data & Analytics",
    "child":[
        {
            "department":"Data Enginnering",
            "child": [
                {"department":"Other"},
                {"department":"Sales"}
            ]
        },
        {
            "department":"Data Science"
        }
    ]
}

output = []
convert(test_data, output)
# convert output to json and send to BigQuery...

For the above example, the result is

[
    {
        "department": "Data & Analytics",
        "child": [
            "Data Enginnering",
            "Data Science"
        ]
    },
    {
        "department": "Data Enginnering",
        "child": [
            "Other",
            "Sales"
        ]
    },
    {
        "department": "Other"
    },
    {
        "department": "Sales"
    },
    {
        "department": "Data Science"
    }
]

It is not quite the same as your example output, but it's unclear from that example why some departments get an object added to the main list, and others don't.

  • Related