I have the following input data:
df = pd.DataFrame({"ID" : [1, 1, 1, 2, 2, 2, 2],
"length" : [0.7, 0.7, 0.7, 0.8, 0.6, 0.6, 0.7],
"height" : [7, 9, np.nan, 4, 8, np.nan, 5]})
df
ID length height
0 1 0.7 7
1 1 0.7 9
2 1 0.7 np.nan
3 2 0.8 4
4 2 0.6 8
5 2 0.6 np.nan
6 2 0.7 5
I want to be able to fill the NaN if a group of "ID" all have the same "length", fill with the maximum "height" in that group of "ID", else fill with the "height" that correspond to the maximum length in that group.
Required Output:
ID length height
0 1 0.7 7
1 1 0.7 9
2 1 0.7 9
3 2 0.8 4
4 2 0.6 8
5 2 0.6 4
6 2 0.7 5
Thanks.
CodePudding user response:
You could try with sort_value
then we use groupby
find the last
#last will find the last not NaN value
df.height.fillna(df.sort_values(['length','height']).groupby(['ID'])['height'].transform('last'),inplace=True)
df
Out[296]:
ID length height
0 1 0.7 7.0
1 1 0.7 9.0
2 1 0.7 9.0
3 2 0.8 4.0
4 2 0.6 8.0
5 2 0.6 4.0
6 2 0.7 5.0