Home > OS >  Converting flat data into a tree
Converting flat data into a tree

Time:04-19

There is a table of the following type:

id | title. | parent_id

id title. parent_id
245 Fruits 1
4 All 0

and there are several thousand lines in it, according to a hierarchical scheme, i.e. all by type of tree and these are categories in the store from the most general to the narrowest.

I need a few more columns for each id, with the name of the parent category up to the most general category. That is, so that there are more columns in the row: Depth 6: id_title ; Depth 5 parent_id title; Depth 4 grandparent_id title and so on the most general.

Or, instead of such a number of columns, I need a code that will make a path to each id

Let's say the category 2642 has the heading "Small tools", and the path in the category tree - 10016->10072->10690->2642. If we replace the category IDs in this path with headers, we get the following tree:

  • Construction and repair
  • Hand tools and accessories
  • Carpentry and locksmith tools
  • Small tools

I don't know how to do it...

CodePudding user response:

For the next time, please provide more data, with the given data yuch a chain is not possible!

I think this should do what you want:

from io import StringIO

import pandas as pd

# Your data faked
# -1 marks "no parent", if you change this, you have to change the if clause
data_file = StringIO("""id|title|parent_id
0|abc|-1
1|def|0
2|ghi|0
3|jkl|1
4|mno|2
8|pqr|4
""")


def handle_value(search_id: int, df: pd.DataFrame):
    tmp_value = df.loc[search_id]  # Get the element with the given id (your index)
    print(tmp_value["title"], end="")
    if tmp_value["parent_id"] >= 0:
        print(" -> ", end="")
        handle_value(search_id=tmp_value["parent_id"], df=df)
    else:
        print()


# Read your file
table = pd.read_table(data_file, sep="|", index_col="id", dtype={"parent_id": int})

print(table)

handle_value(8, df=table)

The outputs:

   title  parent_id
id                 
0    abc         -1
1    def          0
2    ghi          0
3    jkl          1
4    mno          2
8    pqr          4
pqr -> mno -> ghi -> abc

Some points:

  1. I do not catch any errors, if you put in a wrong number it crashes
  2. If you have an circular connection (index 0 has parent 1 and index 1 has parent 0) it crashes
  • Related