I have problem with fillna()
method. This is my example df, which represents quantity of items in a shop. I would like to fill all the NaNs. If there is a NaN, I would like to fill it with values from previous day or if it is NaN, then from the next day (the same product, the same shop). If all days for specific product and shop are NaNs, then I would like to fill it with 0. I am looking for the best pandas way to do it, I had some ideas with loops, but it doesn't look very well.
My df:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples NaN
2 3 shop_A apples 1.0
3 1 shop_A bananas NaN
4 2 shop_A bananas NaN
5 3 shop_A bananas NaN
6 1 shop_B apples NaN
7 2 shop_B apples NaN
8 3 shop_B apples 2.0
9 1 shop_B bananas NaN
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
Expected df:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples 3.0
2 3 shop_A apples 1.0
3 1 shop_A bananas 0.0
4 2 shop_A bananas 0.0
5 3 shop_A bananas 0.0
6 1 shop_B apples 2.0
7 2 shop_B apples 2.0
8 3 shop_B apples 2.0
9 1 shop_B bananas 4.0
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
I also tried fillna(limit=3)
, but this isn't what am I looking for.
CodePudding user response:
You can sort by day using sort_values
and then perform a grouped bfill
and then what's left will just get a 0 with chaining a fillna(0)
:
df['quantity'] = df.sort_values(by='day')\
.groupby(['shop','product'])['quantity'].bfill(limit=3).fillna(0)
prints back:
day shop product quantity
0 1 shop_A apples 3.0
1 2 shop_A apples 1.0
2 3 shop_A apples 1.0
3 1 shop_A bananas 0.0
4 2 shop_A bananas 0.0
5 3 shop_A bananas 0.0
6 1 shop_B apples 2.0
7 2 shop_B apples 2.0
8 3 shop_B apples 2.0
9 1 shop_B bananas 4.0
10 2 shop_B bananas 4.0
11 3 shop_B bananas 2.0
This will give a the value of the next day to the NaN
value of the previous day for each shop and product. You can similarly use ffill
(or both), and perhaps linear interpolation, and your results will change accordingly. This is however what you need to get you started.