Similar to this question Pandas interpolate within a groupby but the answer to that question does the interpolate() for all columns. If I only want to limit the interpolate() to one column how do I do that?
Input
filename val1 val2
t
1 file1.csv 5 10
2 file1.csv NaN NaN
3 file1.csv 15 20
6 file2.csv NaN NaN
7 file2.csv 10 20
8 file2.csv 12 15
Expected Output
filename val1 val2
t
1 file1.csv 5 10
2 file1.csv NaN 15
3 file1.csv 15 20
6 file2.csv NaN NaN
7 file2.csv 10 20
8 file2.csv 12 15
This attempt only returns val2 column but not the rest of the columns.
df = df.groupby('filename').apply(lambda group: group['val2'].interpolate(method='index'))
CodePudding user response:
A direct approach:
df = pd.read_clipboard() # clipboard contains OP sample data
# interpolate only on col "val2"
df["val2_interpolated"] = df[["filename","val2"]].groupby('filename')
.apply(lambda x:x) # WTF
.interpolate(method='linear')["val2"]
returns:
filename val1 val2 val2_interpolated
t
1 file1.csv 5.0 10.0 10.0
2 file1.csv NaN NaN 15.0
3 file1.csv 15.0 20.0 20.0
6 file2.csv NaN NaN 20.0
7 file2.csv 10.0 20.0 20.0
8 file2.csv 12.0 15.0 15.0