I have a dataframe like the one below:
month country prod sales units cust u/cust $/unit
4/1/2022 FRANCE PROD.A 1715 329 148 2.222973 5.212766
4/1/2022 FRANCE PROD.B 1708 371 156 2.378205 4.603774
4/1/2022 ITALY PROD.A 1636 396 176 2.250000 4.131313
4/1/2022 ITALY PROD.B 1965 308 130 2.369231 6.379870
5/1/2022 FRANCE PROD.A 1623 305 143 2.132867 5.321311
5/1/2022 FRANCE PROD.B 1791 398 140 2.842857 4.500000
5/1/2022 ITALY PROD.A 1753 387 134 2.888060 4.529716
5/1/2022 ITALY PROD.B 1643 394 138 2.855072 4.170051
6/1/2022 FRANCE PROD.A 1600 399 149 2.677852 4.010025
6/1/2022 FRANCE PROD.B 1700 327 132 2.477273 5.198777
6/1/2022 ITALY PROD.A 1619 362 136 2.661765 4.472376
6/1/2022 ITALY PROD.B 1871 369 150 2.460000 5.070461
7/1/2022 FRANCE PROD.A NaN 355 144 2.465278 NaN
7/1/2022 FRANCE PROD.B NaN 320 134 2.388060 NaN
7/1/2022 ITALY PROD.A NaN 377 136 2.772059 NaN
7/1/2022 ITALY PROD.B NaN 363 135 2.688889 NaN
What I am trying to do is to 'fillna' the '$/unit' column with the average of the last 3 values for that country and that product.
i.e, the first NaN should be the average of the $/unit of PROD.A in FRANCE in April, May and June (in this example, the average of 5.21, 5.32 and 4.01, or 4.84).
Thanks to another question in this site, I could do a 'forward fill', taking the last value for that product in that country:
df = df.groupby(['country', 'prod'], as_index=False).apply(lambda group: group.ffill())
But I want to use the average of the last three values, and I really can't figure it out...
Thanks in advance!
CodePudding user response:
Try this
# average of the last 3 values for each country-prod pair with same index as df
filler = df.groupby(['country', 'prod'])['$/unit'].transform(lambda x: x.dropna().tail(3).mean())
# fill missing values by filler
df['$/unit'].fillna(filler, inplace=True)
df