I have a DF with parent/child items and I need to associate a time for the parent to all the children items. The time is only listed when the parent matches the child and I need that time to populate on all the children.
This is a simple example.
data = {
'Parent' : ['a123', 'a123', 'a123', 'a123', 'a234', 'a234', 'a234', 'a234'],
'Child' : ['a123', 'a1231', 'a1232', 'a1233', 'a2341', 'a234', 'a2342', 'a2343'],
'Time' : [51, 0, 0, 0, 0, 39, 0, 0],
}
The expected results are:
results= {
'Parent' : ['a123', 'a123', 'a123', 'a123', 'a234', 'a234', 'a234', 'a234'],
'Child' : ['a123', 'a1231', 'a1232', 'a1233', 'a2341', 'a234', 'a2342', 'a2343'],
'Time' : [51, 51, 51, 51, 39, 39, 39, 39],
}
Seems like it should be easy, but I can't wrap my head around where to start.
CodePudding user response:
If time is positive for the parent, or null, you can use a simple groupby.transform('max')
:
df['Time'] = df.groupby('Parent')['Time'].transform('max')
Else, you can use:
df['Time'] = (df['Time']
.where(df['Parent'].eq(df['Child']))
.groupby(df['Parent']).transform('first')
.convert_dtypes()
)
Output:
Parent Child Time
0 a123 a123 51
1 a123 a1231 51
2 a123 a1232 51
3 a123 a1233 51
4 a234 a2341 39
5 a234 a234 39
6 a234 a2342 39
7 a234 a2343 39