I have a DataFrame like this:
data = {'id': [112, 114, 221, 262, 299, 300], 'parent_id': [300, 262, 558, 221, 560, 299], 'name': ['zona1', 'zona2', 'Activo1', 'zona3', 'Activo2', 'zona4']}
pd.DataFrame.from_dict(data)
My goal is to create another column with the name of the parent_id that matchs with a list of 6 items:
list_names_act = [558, 559, 560, 561, 562, 563]
So, I get the first item of the column id = 112. Since 112 is not found in list_names_act I have to get the item at the same position of the column parent_id = 300 and do the same comprobation. 300 is not in list_names_act so I must search it in column id and check if the item in the column parent_id (299) in the same position is in list_names_act. 299 is not in list_names_act so I must get that item again and search it in id and, again, check if the item in parent_id is in list_names_act. In this last iteration, the value does belong to list_names_act so I must get the value of the column name in that position and replicate that name to all the values that were needed to get there including the last one. My output would be this:
data = {'id': [112, 114, 221, 262, 299, 300],
'parent_id': [300, 262, 558, 221, 560, 299],
'name': ['zona1', 'zona2', 'Activo1', 'zona3', 'Activo2', 'zona4'],
'name_activo': ['Activo2', 'Activo1', 'Activo1', 'Activo1', 'Activo2', 'Activo2' ]}
pd.DataFrame.from_dict(data)
I´ve tried with for loops but I´m only capable of filling 42 regs and I really don't know how to handle the change in the value that I'm looking for:
nom_activo_grande = []
for i in range(len(locacion_merge['id_x'])):
if locacion_merge['id_x'][i] in lista_tipos:
nom_activo_grande.append(locacion_merge['name'][i])
elif locacion_merge['parent_id'][i] in lista_tipos:
nom_activo_grande.append(locacion_merge['name'][i])
elif locacion_merge['parent_id'][i] not in lista_tipos:
for j in range(len(locacion_merge['id_x'])):
if locacion_merge['parent_id'][i] == locacion_merge['id_x'][j]:
else: nom_activo_grande.append(0)
thanks so much
CodePudding user response:
One approach, IIUC:
# create a parent graph (actually is a tree, forest?)
parents = dict(zip(df["parent_id"], df["id"]))
# create a map of id to assets
names = {key: value for key, value in zip(df["parent_id"], df["name"]) if key in list_names_act}
unnamed = [c for c in df["id"] if c not in names]
while unnamed:
for parent, child in parents.items():
# if parent is in the mappings but not the child
if parent in names and child not in names:
names[child] = names[parent] # set the same asset as the parent
unnamed.remove(child) # remove from unassigned
df["name_activo"] = df["id"].map(names)
print(df)
Output
id parent_id name name_activo
0 112 300 zona1 Activo2
1 114 262 zona2 Activo1
2 221 558 Activo1 Activo1
3 262 221 zona3 Activo1
4 299 560 Activo2 Activo2
5 300 299 zona4 Activo2
- The main idea is to create a parent-child graph (
parents
) and do a sort ofimport networkx as nx # create the graph from the DataFrame G = nx.from_pandas_edgelist(df, source='parent_id', target='id', create_using=nx.DiGraph) S = set(list_names_act) # get ancestor that is in "list_names_act" # (arbitrary one if several, remove next/iter for all) d = {n: next(iter(nx.ancestors(G, n)&S), None) for n in G.nodes} # map ancestor df['act_id'] = df['id'].map(d) # map name df['name_activo'] = df['act_id'].map(df.set_index('parent_id')['name'])
output:
id parent_id name act_id name_activo 0 112 300 zona1 560.0 Activo2 1 114 262 zona2 558.0 Activo1 2 221 558 Activo1 558.0 Activo1 3 262 221 zona3 558.0 Activo1 4 299 560 Activo2 560.0 Activo2 5 300 299 zona4 560.0 Activo2