I'm trying to wrap my head around pandas/numpy to do something that should probably be simple (but my lack of knowledge...)
The series looks something like this:
2022-01-01 | 100
2022-01-02 | nan
2022-01-03 | nan
2022-01-04 | 200
2022-01-05 | nan
2022-01-06 | nan
2022-01-07 | nan
2022-01-08 | 250
2022-01-09 | nan
2022-01-10 | 400
2022-01-11 | nan
This represents a consumption. So, on the example above, on the 8 of January, we see that it became 250 and the last known value was 200 on the 4th of January. So, we know that on that period, the average was 12.5 per day. What I want to obtain is exactly this: the daily average (retrospectively). So, from the example above, that would be the result I'm looking for:
2022-01-01 | 33.3 | 100
2022-01-02 | 33.3 | 133.3
2022-01-03 | 33.3 | 166.6
2022-01-04 | 12.5 | 200
2022-01-05 | 12.5 | 212.5
2022-01-06 | 12.5 | 225
2022-01-07 | 12.5 | 237.5
2022-01-08 | 75 | 250
2022-01-09 | 75 | 325
2022-01-10 | 0 | 400
2022-01-11 | 0 | 400
The last column is just for checking that everything is cumulated as expected. But it's not needed, I'm fine with having a Series
as output.
I've already tried twisting the Series
in many ways, also doing it with the debugger. But alas, I think it would be better to ask someone who knows, along with some explanation so that I can understand better how to reach that result.
CodePudding user response:
You're looking for pd.Series.interpolate
>>> series
date
2022-01-01 100.0
2022-01-02 NaN
2022-01-03 NaN
2022-01-04 200.0
2022-01-05 NaN
2022-01-06 NaN
2022-01-07 NaN
2022-01-08 250.0
2022-01-09 NaN
2022-01-10 400.0
2022-01-11 NaN
Name: value, dtype: float64
>>> series.interpolate()
date
2022-01-01 100.000000
2022-01-02 133.333333
2022-01-03 166.666667
2022-01-04 200.000000
2022-01-05 212.500000
2022-01-06 225.000000
2022-01-07 237.500000
2022-01-08 250.000000
2022-01-09 325.000000
2022-01-10 400.000000
2022-01-11 400.000000
Name: value, dtype: float64
>>> series.interpolate().diff().shift(-1)
date
2022-01-01 33.333333
2022-01-02 33.333333
2022-01-03 33.333333
2022-01-04 12.500000
2022-01-05 12.500000
2022-01-06 12.500000
2022-01-07 12.500000
2022-01-08 75.000000
2022-01-09 75.000000
2022-01-10 0.000000
2022-01-11 NaN
Name: value, dtype: float64