Home > front end >  Calculating an average, when only some "checkpoints" are known
Calculating an average, when only some "checkpoints" are known

Time:01-05

I'm trying to wrap my head around pandas/numpy to do something that should probably be simple (but my lack of knowledge...)

The series looks something like this:

    2022-01-01 | 100
    2022-01-02 | nan
    2022-01-03 | nan
    2022-01-04 | 200
    2022-01-05 | nan
    2022-01-06 | nan
    2022-01-07 | nan
    2022-01-08 | 250
    2022-01-09 | nan
    2022-01-10 | 400
    2022-01-11 | nan

This represents a consumption. So, on the example above, on the 8 of January, we see that it became 250 and the last known value was 200 on the 4th of January. So, we know that on that period, the average was 12.5 per day. What I want to obtain is exactly this: the daily average (retrospectively). So, from the example above, that would be the result I'm looking for:

    2022-01-01 | 33.3   | 100
    2022-01-02 | 33.3   | 133.3
    2022-01-03 | 33.3   | 166.6
    2022-01-04 | 12.5   | 200
    2022-01-05 | 12.5   | 212.5
    2022-01-06 | 12.5   | 225
    2022-01-07 | 12.5   | 237.5
    2022-01-08 | 75     | 250
    2022-01-09 | 75     | 325
    2022-01-10 | 0      | 400
    2022-01-11 | 0      | 400

The last column is just for checking that everything is cumulated as expected. But it's not needed, I'm fine with having a Series as output.

I've already tried twisting the Series in many ways, also doing it with the debugger. But alas, I think it would be better to ask someone who knows, along with some explanation so that I can understand better how to reach that result.

CodePudding user response:

You're looking for pd.Series.interpolate

>>> series
date
2022-01-01    100.0
2022-01-02      NaN
2022-01-03      NaN
2022-01-04    200.0
2022-01-05      NaN
2022-01-06      NaN
2022-01-07      NaN
2022-01-08    250.0
2022-01-09      NaN
2022-01-10    400.0
2022-01-11      NaN
Name: value, dtype: float64
>>> series.interpolate()
date
2022-01-01    100.000000
2022-01-02    133.333333
2022-01-03    166.666667
2022-01-04    200.000000
2022-01-05    212.500000
2022-01-06    225.000000
2022-01-07    237.500000
2022-01-08    250.000000
2022-01-09    325.000000
2022-01-10    400.000000
2022-01-11    400.000000
Name: value, dtype: float64
>>> series.interpolate().diff().shift(-1)
date
2022-01-01    33.333333
2022-01-02    33.333333
2022-01-03    33.333333
2022-01-04    12.500000
2022-01-05    12.500000
2022-01-06    12.500000
2022-01-07    12.500000
2022-01-08    75.000000
2022-01-09    75.000000
2022-01-10     0.000000
2022-01-11          NaN
Name: value, dtype: float64
  • Related