I'm trying to interpolate based on grouped pandas data using lambda.
I know that with the below data:
df = pd.DataFrame({'ID':[1, 1, 1, 2, 2, 2, 3, 3, 3],
'Value' : [np.nan, 5, np.nan, 7, np.nan, 9, 1, 3, np.nan]})
ID Value
0 1 NaN
1 1 5.0
2 1 NaN
3 2 7.0
4 2 NaN
5 2 9.0
6 3 1.0
7 3 3.0
8 3 NaN
I can use lambda to interpolate within each group and fill in the gaps like this:
df.groupby('ID').apply(lambda group: group.interpolate(limit_direction='both'))
However, I only want to fill in the gaps on groups that are specified within a set list, for example:
int_IDs = [1, 3]
Such that those IDs have their blanks filled, but the remaining IDs don't, so the output would be:
ID Value
0 1 5.0
1 1 5.0
2 1 5.0
3 2 7.0
4 2 NaN
5 2 9.0
6 3 1.0
7 3 3.0
8 3 3.0
I've tried adding a conditional into the lambda, but that gives ValueError: The truth value of a DataFrame is ambiguous.
:
df.groupby('ID').apply(lambda group: group.interpolate(limit_direction='both') if group in int_IDs else group)
How can I do this?
CodePudding user response:
Use a mask to restrict the groupby
:
int_IDs = [1, 3]
m = df['ID'].isin(int_IDs)
df['New_Value'] = (df[m]
.groupby('ID')['Value']
.apply(lambda group: group.interpolate(limit_direction='both'))
.combine_first(df['Value'])
)
output:
ID Value New_Value
0 1 NaN 5.0
1 1 5.0 5.0
2 1 NaN 5.0
3 2 7.0 7.0
4 2 NaN NaN
5 2 9.0 9.0
6 3 1.0 1.0
7 3 3.0 3.0
8 3 NaN 3.0
In place:
int_IDs = [1, 3]
m = df['ID'].isin(int_IDs)
df.loc[m, 'Value'] = (df[m]
.groupby('ID')['Value']
.apply(lambda group: group.interpolate(limit_direction='both'))
)
output:
ID Value
0 1 5.0
1 1 5.0
2 1 5.0
3 2 7.0
4 2 NaN
5 2 9.0
6 3 1.0
7 3 3.0
8 3 3.0