Home > Back-end >  Creating a hierarchical tree dict
Creating a hierarchical tree dict

Time:12-24

I have been working with D3 and ObservableHQ to create a Tree Plot of some hierarchical data (SARS-CoV-2 Variant lineages). I'm looking at ways to expand what I can do with the dataset and so I am moving it to Python to experiment.

One thing I did with D3 was the use d3.stratify on my tsv dataset. I am trying to recreate that data structure using python dict


class Lineage:
    tree = dict()

    def __init__(self, designation_date: str, pango: str, partial: str, unaliased: str):
        self.designation_date = designation_date
        self.pango = pango
        self.partial = partial
        self.unaliased = unaliased
        self.parent = Lineage.parent(partial)

    @classmethod
    def parent(cls, lineage: str) -> Union[str, None]:
        if "." in lineage:
            return ".".join(lineage.split('.')[:-1])
        return None

Here is the tree specific code from the Lineage class above

    @classmethod
    def add_to_tree(cls, lineage: Lineage):
        if lineage.parent is None:
            # No children to add
            cls.tree.update(
                {
                    'id': lineage.partial,
                    'data': lineage,
                    'parent': None,
                }
            )

        elif 'children' in cls.tree:
            count = len(cls.tree['children'].items())
            cls.tree['children'][count] = {
                'id': lineage.partial,
                'data': lineage,
                'parent': lineage.parent,
            }
        else:
            cls.tree['children'] = {
                0: {
                    'id': lineage.partial,
                    'data': lineage,
                    'parent': lineage.parent,
                }
            }


In the following dict object, you will see what the code produces. And below that is an image of the structure of the dataset.

As you can see, child index 1 should actually be a child of lineage at index 0. And Children indexes 2 and 5 should actually be children of lineage at index 1, and so on.

The problem is that everything is a child a the root. There is no further hierarchy. What are some ways to mitigate this issue?

{
    "id": "BA",
    "data": {
        "designation_date": "2021-11-24",
        "pango": "B.1.1.529",
        "partial": "BA",
        "unaliased": "B.1.1.529",
        "parent": null
    },
    "parent": null,
    "children": {
        "0": {
            "id": "BA.1",
            "data": {
                "designation_date": "2021-12-07",
                "pango": "BA.1",
                "partial": "BA.1",
                "unaliased": "B.1.1.529.1",
                "parent": "BA"
            },
            "parent": "BA"
        },
        "1": {
            "id": "BA.1.1",
            "data": {
                "designation_date": "2022-01-07",
                "pango": "BA.1.1",
                "partial": "BA.1.1",
                "unaliased": "B.1.1.529.1.1",
                "parent": "BA.1"
            },
            "parent": "BA.1"
        },
        "2": {
            "id": "BA.1.1.1",
            "data": {
                "designation_date": "2022-02-23",
                "pango": "BA.1.1.1",
                "partial": "BA.1.1.1",
                "unaliased": "B.1.1.529.1.1.1",
                "parent": "BA.1.1"
            },
            "parent": "BA.1.1"
        },
        "3": {
            "id": "BA.1.1.1.1",
            "data": {
                "designation_date": "2022-02-23",
                "pango": "BC.1",
                "partial": "BA.1.1.1.1",
                "unaliased": "B.1.1.529.1.1.1.1",
                "parent": "BA.1.1.1"
            },
            "parent": "BA.1.1.1"
        },
        "4": {
            "id": "BA.1.1.1.2",
            "data": {
                "designation_date": "2022-06-13",
                "pango": "BC.2",
                "partial": "BA.1.1.1.2",
                "unaliased": "B.1.1.529.1.1.1.2",
                "parent": "BA.1.1.1"
            },
            "parent": "BA.1.1.1"
        },
        "5": {
            "id": "BA.1.1.2",
            "data": {
                "designation_date": "2022-02-23",
                "pango": "BA.1.1.2",
                "partial": "BA.1.1.2",
                "unaliased": "B.1.1.529.1.1.2",
                "parent": "BA.1.1"
            },
            "parent": "BA.1.1"
        },

Again, this is how the data SHOULD be if it was in the correct heirachy. However, it isn't enter image description here

Data

I discard the Github issues for now

"PANGO Lineage" "Partial-aliased PANGO Lineage" "Unaliased PANGO Lineage"   "Designation Date"  "GitHub Issue"
B.1.1.529   BA  B.1.1.529   2021-11-24  #343
BA.1    BA.1    B.1.1.529.1 2021-12-07  #361
BA.1.1  BA.1.1  B.1.1.529.1.1   2022-01-07  #360
BA.1.1.1    BA.1.1.1    B.1.1.529.1.1.1 2022-02-23
BC.1    BA.1.1.1.1  B.1.1.529.1.1.1.1   2022-02-23
BC.2    BA.1.1.1.2  B.1.1.529.1.1.1.2   2022-06-13  #570
BA.1.1.2    BA.1.1.2    B.1.1.529.1.1.2 2022-02-23

CodePudding user response:

I would suggest adding a class attribute refs so you can keep a reference to all nodes in your tree by their id. That way you can quickly look up the parent of the node that must be added, and establish the parent-child link with it:

class Lineage:
    tree = dict()
    refs = dict()

    # ...

    @classmethod
    def add_to_tree(cls, lineage: 'Lineage'):
        node = {
            "id": lineage.partial,
            "lineage": lineage,
            "parent": None,
            "children": []
        }
        cls.refs[lineage.partial] = node
        if lineage.parent is None:
            cls.tree.update(node)
        elif lineage.parent in cls.refs:
            cls.refs[lineage.parent]["children"].append(node)
            node["parent"] = cls.refs[lineage.parent]

This way of working requires that parents are added to the tree before their children. After loading all nodes into the tree you can empty or delete the refs dictionary, as it has served its purpose.

  • Related