So I have the following dataframe
import pandas as pd
df = pd.DataFrame({'ID': [ 0 , 1 , 2 , 3 ],
'name': ['A', 'B', 'C', 'D'],
'parent': [ '', 'A', 'A', 'B']})
Now as you see there is a parent-child hierarchy in here. However, unfortunately this is based on strings.
Is it possible to very effectively / neatly to create a new column with the actual numeric parent id?
E.g. the result would be;
df = pd.DataFrame({'ID': [ 0 , 1 , 2 , 3 ],
'name': [ 'A', 'B', 'C', 'D'],
'parent': [ '', 'A', 'A', 'B'],
'parentID': [None, 0, 0, 1 ]})
CodePudding user response:
Use map
on a crafted Series:
df['parentID'] = df['parent'].map(df.set_index('name')['ID'])
output:
ID name parent parentID
0 0 A NaN
1 1 B A 0.0
2 2 C A 0.0
3 3 D B 1.0
CodePudding user response:
You can simply do a merge of your dataframe with itself :
df.merge(
df[["ID", "name"]].rename(columns={"ID" : "Parent ID", "name" : "parent"}),
on="parent",
how="left"
)