Changing value of a search - Parent Child Structure-CodePudding

I have a DataFrame like this:

data = {'id': [112, 114, 221, 262, 299, 300], 'parent_id': [300, 262, 558, 221, 560, 299], 'name': ['zona1', 'zona2', 'Activo1', 'zona3', 'Activo2', 'zona4']}
pd.DataFrame.from_dict(data)

My goal is to create another column with the name of the parent_id that matchs with a list of 6 items:

list_names_act = [558, 559, 560, 561, 562, 563]

So, I get the first item of the column id = 112. Since 112 is not found in list_names_act I have to get the item at the same position of the column parent_id = 300 and do the same comprobation. 300 is not in list_names_act so I must search it in column id and check if the item in the column parent_id (299) in the same position is in list_names_act. 299 is not in list_names_act so I must get that item again and search it in id and, again, check if the item in parent_id is in list_names_act. In this last iteration, the value does belong to list_names_act so I must get the value of the column name in that position and replicate that name to all the values that were needed to get there including the last one. My output would be this:

    data = {'id': [112, 114, 221, 262, 299, 300], 
        'parent_id': [300, 262, 558, 221, 560, 299], 
        'name': ['zona1', 'zona2', 'Activo1', 'zona3', 'Activo2', 'zona4'],
       'name_activo': ['Activo2', 'Activo1', 'Activo1', 'Activo1', 'Activo2', 'Activo2' ]}
pd.DataFrame.from_dict(data)

I´ve tried with for loops but I´m only capable of filling 42 regs and I really don't know how to handle the change in the value that I'm looking for:

nom_activo_grande = []

for i in range(len(locacion_merge['id_x'])):
    if locacion_merge['id_x'][i] in lista_tipos:
        nom_activo_grande.append(locacion_merge['name'][i]) 
        
    elif locacion_merge['parent_id'][i] in lista_tipos:
        nom_activo_grande.append(locacion_merge['name'][i]) 
        
        
    elif locacion_merge['parent_id'][i] not in lista_tipos:
        for j in range(len(locacion_merge['id_x'])):
            if locacion_merge['parent_id'][i] == locacion_merge['id_x'][j]:

    
    else: nom_activo_grande.append(0)

thanks so much

CodePudding user response：

One approach, IIUC:

# create a parent graph (actually is a tree, forest?)
parents = dict(zip(df["parent_id"], df["id"]))

# create a map of id to assets
names = {key: value for key, value in zip(df["parent_id"], df["name"]) if key in list_names_act}

unnamed = [c for c in df["id"] if c not in names]

while unnamed:
    for parent, child in parents.items():
        # if parent is in the mappings but not the child
        if parent in names and child not in names:
            names[child] = names[parent]  # set the same asset as the parent
            unnamed.remove(child)  # remove from unassigned

df["name_activo"] = df["id"].map(names)
print(df)

Output

    id  parent_id     name name_activo
0  112        300    zona1     Activo2
1  114        262    zona2     Activo1
2  221        558  Activo1     Activo1
3  262        221    zona3     Activo1
4  299        560  Activo2     Activo2
5  300        299    zona4     Activo2

The main idea is to create a parent-child graph (parents) and do a sort of

import networkx as nx

# create the graph from the DataFrame
G = nx.from_pandas_edgelist(df, source='parent_id', target='id',
                            create_using=nx.DiGraph)

S = set(list_names_act)

# get ancestor that is in "list_names_act"
# (arbitrary one if several, remove next/iter for all)
d = {n: next(iter(nx.ancestors(G, n)&S), None) for n in G.nodes}

# map ancestor
df['act_id'] = df['id'].map(d)
# map name
df['name_activo'] = df['act_id'].map(df.set_index('parent_id')['name'])

output:

    id  parent_id     name  act_id name_activo
0  112        300    zona1   560.0     Activo2
1  114        262    zona2   558.0     Activo1
2  221        558  Activo1   558.0     Activo1
3  262        221    zona3   558.0     Activo1
4  299        560  Activo2   560.0     Activo2
5  300        299    zona4   560.0     Activo2