Pandas : fill NaN with the closest value, according to a category column-CodePudding

Let's take a sample dataframe :

df = pd.DataFrame({"Date": ["2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06","2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06"],
                   "Animal" :["Cat","Cat","Cat","Cat","Cat","Cat","Dog","Dog","Dog","Dog","Dog","Dog"],
                   "Quantity":[np.nan,4,3,5,1,np.nan,6,5,np.nan,np.nan,2,1]})

          Date Animal  Quantity
0   2022-10-01    Cat       NaN
1   2022-10-02    Cat       4.0
2   2022-10-03    Cat       3.0
3   2022-10-04    Cat       5.0
4   2022-10-05    Cat       1.0
5   2022-10-06    Cat       NaN
6   2022-10-01    Dog       6.0
7   2022-10-02    Dog       5.0
8   2022-10-03    Dog       NaN
9   2022-10-04    Dog       NaN
10  2022-10-05    Dog       2.0
11  2022-10-06    Dog       1.0

I would like to fill the NaN values in the column Quantity using the following method :

Replace the NaN values with the closest value that is before the NaN value and which share the same value in Animal column
If there is still some NaN values, replace the remaining NaN values with the closest value that is after the Nan value and which share the same value in Animal column

I thought to Series.interpolate but I don't know how to deal with the Animal column. Would you please know an efficient way to reach the expected output ?

Expected output :

          Date Animal  Quantity
0   2022-10-01    Cat         4
1   2022-10-02    Cat         4
2   2022-10-03    Cat         3
3   2022-10-04    Cat         5
4   2022-10-05    Cat         1
5   2022-10-06    Cat         1
6   2022-10-01    Dog         6
7   2022-10-02    Dog         5
8   2022-10-03    Dog         5
9   2022-10-04    Dog         5
10  2022-10-05    Dog         2
11  2022-10-06    Dog         1
``

CodePudding user response：

You can use ffill/bfill per group:

df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
                    .apply(lambda s: s.bfill().ffill())
                 )

Output:

          Date Animal  Quantity
0   2022-10-01    Cat       4.0
1   2022-10-02    Cat       4.0
2   2022-10-03    Cat       3.0
3   2022-10-04    Cat       5.0
4   2022-10-05    Cat       1.0
5   2022-10-06    Cat       1.0
6   2022-10-01    Dog       6.0
7   2022-10-02    Dog       5.0
8   2022-10-03    Dog       5.0
9   2022-10-04    Dog       5.0
10  2022-10-05    Dog       2.0
11  2022-10-06    Dog       1.0

CodePudding user response：

Addition tp @mozway sol (as I cant edit it). I've absorved the column should be typecasted to int

df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
                    .apply(lambda s: s.bfill().ffill())
                 )

df['Quantity'] = df['Quantity'].astype('int')
print(df)

output #

          Date Animal  Quantity
0   2022-10-01    Cat         4
1   2022-10-02    Cat         4
2   2022-10-03    Cat         3
3   2022-10-04    Cat         5
4   2022-10-05    Cat         1
5   2022-10-06    Cat         1
6   2022-10-01    Dog         6
7   2022-10-02    Dog         5
8   2022-10-03    Dog         2
9   2022-10-04    Dog         2
10  2022-10-05    Dog         2
11  2022-10-06    Dog         1