Let's take a sample dataframe :
df = pd.DataFrame({"Date": ["2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06","2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06"],
"Animal" :["Cat","Cat","Cat","Cat","Cat","Cat","Dog","Dog","Dog","Dog","Dog","Dog"],
"Quantity":[np.nan,4,3,5,1,np.nan,6,5,np.nan,np.nan,2,1]})
Date Animal Quantity
0 2022-10-01 Cat NaN
1 2022-10-02 Cat 4.0
2 2022-10-03 Cat 3.0
3 2022-10-04 Cat 5.0
4 2022-10-05 Cat 1.0
5 2022-10-06 Cat NaN
6 2022-10-01 Dog 6.0
7 2022-10-02 Dog 5.0
8 2022-10-03 Dog NaN
9 2022-10-04 Dog NaN
10 2022-10-05 Dog 2.0
11 2022-10-06 Dog 1.0
I would like to fill the NaN values in the column Quantity
using the following method :
- Replace the NaN values with the closest value that is before the NaN value and which share the same value in
Animal
column - If there is still some NaN values, replace the remaining NaN values with the closest value that is after the Nan value and which share the same value in
Animal
column
I thought to Series.interpolate but I don't know how to deal with the Animal
column. Would you please know an efficient way to reach the expected output ?
Expected output :
Date Animal Quantity
0 2022-10-01 Cat 4
1 2022-10-02 Cat 4
2 2022-10-03 Cat 3
3 2022-10-04 Cat 5
4 2022-10-05 Cat 1
5 2022-10-06 Cat 1
6 2022-10-01 Dog 6
7 2022-10-02 Dog 5
8 2022-10-03 Dog 5
9 2022-10-04 Dog 5
10 2022-10-05 Dog 2
11 2022-10-06 Dog 1
``
CodePudding user response:
You can use ffill
/bfill
per group:
df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
.apply(lambda s: s.bfill().ffill())
)
Output:
Date Animal Quantity
0 2022-10-01 Cat 4.0
1 2022-10-02 Cat 4.0
2 2022-10-03 Cat 3.0
3 2022-10-04 Cat 5.0
4 2022-10-05 Cat 1.0
5 2022-10-06 Cat 1.0
6 2022-10-01 Dog 6.0
7 2022-10-02 Dog 5.0
8 2022-10-03 Dog 5.0
9 2022-10-04 Dog 5.0
10 2022-10-05 Dog 2.0
11 2022-10-06 Dog 1.0
CodePudding user response:
Addition tp @mozway
sol (as I cant edit it). I've absorved the column should be typecasted to int
df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
.apply(lambda s: s.bfill().ffill())
)
df['Quantity'] = df['Quantity'].astype('int')
print(df)
output #
Date Animal Quantity
0 2022-10-01 Cat 4
1 2022-10-02 Cat 4
2 2022-10-03 Cat 3
3 2022-10-04 Cat 5
4 2022-10-05 Cat 1
5 2022-10-06 Cat 1
6 2022-10-01 Dog 6
7 2022-10-02 Dog 5
8 2022-10-03 Dog 2
9 2022-10-04 Dog 2
10 2022-10-05 Dog 2
11 2022-10-06 Dog 1