Home > Software engineering >  How to sort data with recursion from the parent first and then proceed to its children using python?
How to sort data with recursion from the parent first and then proceed to its children using python?

Time:12-16

I have a model named Topics with data like this:

id has_sub_topic parent_id subject_module_level_id order
27 1 NULL 25 1
31 1 NULL 25 2
34 0 NULL 25 3
28 0 27 25 1
29 0 27 25 2
40 1 27 25 3
32 0 31 25 1
33 0 31 25 2
41 1 40 25 1
43 0 40 25 2
44 1 40 25 3
42 0 41 25 1
45 0 44 25 1
47 1 44 25 2
48 0 47 25 1

I Want to sort it by the parent first and then proceed to its children like depth-first processing and get the data only the topics that no has_sub_topic. So, the data will be sorted by order like this: https://upload.wikimedia.org/wikipedia/commons/7/7f/Depth-First-Search.gif and get the data only 4, 7, 8, 10

Previously I try using sorted function, but it didn't go well with many child. So, I must use recursive function. My code using recursive is like this:

# Example data for topics
import pandas as pd
topics = pd.DataFrame({
    'id': [27, 31, 34, 28, 29, 40, 32, 33, 41, 43, 44, 42, 45, 47, 48], 
    'has_sub_topic': [1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0],
    'parent_id': [None, None, None, 27, 27, 27, 31, 31, 40, 40, 40, 41, 44, 44, 47],
    'subject_module_level_id': [25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25],
    'order': [1, 2, 3, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 1]
    })

def topic_child_order(topic, list_topics=None):
    if list_topics is None: list_topics = []

    if topic.has_sub_topic:
        topics = Topics.objects.filter(parent=topic).order_by('order')
        for child in topics:
            result = topic_child_order(child, list_topics)
    else:
        result = topic

    list_topics.append(result)

    return list_topics
    

topics = Topics.objects.filter(
    subject_module_level_id=25,
    parent=None
).order_by('order')

topics_order = []

for topic in topics:
    topics_order.append(topic_child_order(topic))

The result is like this:

[
  [
    <Topics: Topicsobject(28)>,
    <Topics: Topicsobject(29)>,
    <Topics: Topicsobject(42)>,
    [
      ...
    ],
    <Topics: Topicsobject(43)>,
    <Topics: Topicsobject(45)>,
    <Topics: Topicsobject(48)>,
    [
      ...
    ],
    [
      ...
    ],
    [
      ...
    ],
    [
      ...
    ]
  ],
  [
    <Topics: Topicsobject(32)>,
    <Topics: Topicsobject(33)>,
    [
      ...
    ]
  ],
  [
    <Topics: Topicsobject(34)>
  ]
]

The sort order is right but I don't know why the result has empty lists. Anyone know how to fix this? Or anyone know how to do this more better way so the result only return in one list not nested list?

CodePudding user response:

I explicitly built a tree in the form of a nested python dict mapping parent id to list of children ids, using .iterrows to add nodes to the tree. Children are sorted using their given order.

Then I perform a simple depth-first-search in the tree, yielding the leaves' ids along the way.

Finally I use .loc to select rows in the dataframe.

import pandas as pd

topics = pd.DataFrame({
    'id': [27, 31, 34, 28, 29, 40, 32, 33, 41, 43, 44, 42, 45, 47, 48], 
    'has_sub_topic': [1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0],
    'parent_id': [0, 0, 0, 27, 27, 27, 31, 31, 40, 40, 40, 41, 44, 44, 47],
    'subject_module_level_id': [25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25],
    'order': [1, 2, 3, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1, 2, 1]
    }).set_index('id')

tree = {}
for i, row in topics.iterrows():
    tree.setdefault(row['parent_id'], []).append(i)

for brotherhood in tree.values():
    brotherhood.sort(key=lambda sibling: topics.at[sibling,'order'])

# print( tree )
# {0: [27, 31, 34], 27: [28, 29, 40], 31: [32, 33], 40: [41, 43, 44], 41: [42], 44: [45, 47], 47: [48]}

def gen_leaves(tree, i=0):
    if i in tree:
        for child in tree[i]:
            yield from gen_leaves(tree, child)
    else:
        yield i

# print( list(gen_leaves(tree)) )
# [28, 29, 42, 43, 45, 48, 32, 33, 34]

leaf_ids = list(gen_leaves(tree))
topics_leaves = topics.loc[leaf_ids]

print(topics_leaves)
#     has_sub_topic  parent_id  subject_module_level_id  order
# id                                                          
# 28              0         27                       25      1
# 29              0         27                       25      2
# 42              0         41                       25      1
# 43              0         40                       25      2
# 45              0         44                       25      1
# 48              0         47                       25      1
# 32              0         31                       25      1
# 33              0         31                       25      2
# 34              0          0                       25      3
  • Related