How to calculate Month to Date (MTD) and YTD as cumulative average using Pandas dataframe?-CodePudding

I want to calculative MTD and YTD as cumulative average using pandas dataframe.I wrote below code to generate the output.

code:

import pandas as pd

#MTD AND YTD calculation

data = {'date' : ['2016/11/01', '2016/11/02', '2016/11/03', '2016/11/04', '2016/11/05', '2016/11/05', '2016/12/01', '2016/12/02', '2016/12/03', '2016/12/04', '2016/12/15', '2016/12/20', '2016/12/23', '2016/12/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30', '2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30','2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30', '2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30','2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30'],
        'category': ['fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit'],

        'product': ['grapes', 'grapes', 'grapes', 'kiwi', 'kiwi', 'grapes', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange'],
        'price': [10, 10, 20, 40, 60, 30, 10, 20, 10, 50, 10, 5, 10, 10, 10, 20, 10, 50, 10, 5, 10, 10, 20, 10, 5, 5, 10, 10, 20, 50, 10, 5, 20, 10, 10, 20, 50, 20, 5, 5, 10, 10, 20, 50, 30, 10, 20, 5, 5, 10, 20, 10, 20, 10, 40, 20, 10, 10, 20, 20, 10, 5]}


df = pd.DataFrame(data)

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean()

print(df)

But I got a error

Error:

Traceback (most recent call last):
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 11000, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/series.py", line 4672, in reindex
    return super().reindex(**kwargs)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/generic.py", line 4981, in _reindex_axes
    new_index, indexer = ax.reindex(
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4223, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ab/parry-data_processing/pa/poc.py", line 21, in <module>
    df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3655, in __setitem__
    self._set_item(key, value)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3832, in _set_item
    value = self._sanitize_column(value)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 4532, in _sanitize_column
    return _reindex_for_setitem(value, self.index)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 11007, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

Can anyone suggest a solution to find the cumulative average for MTD and YTD?

CodePudding user response：

Use Series.droplevel for remove first 3 levels in MultiIndex:

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean().droplevel([0,1,2])
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean().droplevel([0,1,2])

print(df)

         date category product  price        MTD        YTD
0  2016-11-01    fruit  grapes     10  10.000000  10.000000
1  2016-11-02    fruit  grapes     10  10.000000  10.000000
2  2016-11-03    fruit  grapes     20  13.333333  13.333333
3  2016-11-04    fruit    kiwi     40  40.000000  40.000000
4  2016-11-05    fruit    kiwi     60  50.000000  50.000000
..        ...      ...     ...    ...        ...        ...
57 2017-05-04    fruit  Orange     10  20.000000  18.333333
58 2017-05-15    fruit  Orange     20  20.000000  18.461538
59 2017-05-20    fruit  Orange     20  20.000000  18.571429
60 2017-05-23    fruit  Orange     10  18.571429  18.000000
61 2017-05-30    fruit  Orange      5  16.875000  17.187500

[62 rows x 6 columns]

If need dynamic solution for remove all levels without last is possible use MultiIndex.nlevels for gen number of levels and subtract 1 for keep last level:

s1 = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
s2 = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean()
df['MTD'] = s1.droplevel(list(range(s1.index.nlevels - 1)))
df['YTD'] = s2.droplevel(list(range(s1.index.nlevels - 1)))

print(df)

         date category product  price        MTD        YTD
0  2016-11-01    fruit  grapes     10  10.000000  10.000000
1  2016-11-02    fruit  grapes     10  10.000000  10.000000
2  2016-11-03    fruit  grapes     20  13.333333  13.333333
3  2016-11-04    fruit    kiwi     40  40.000000  40.000000
4  2016-11-05    fruit    kiwi     60  50.000000  50.000000
..        ...      ...     ...    ...        ...        ...
57 2017-05-04    fruit  Orange     10  20.000000  18.333333
58 2017-05-15    fruit  Orange     20  20.000000  18.461538
59 2017-05-20    fruit  Orange     20  20.000000  18.571429
60 2017-05-23    fruit  Orange     10  18.571429  18.000000
61 2017-05-30    fruit  Orange      5  16.875000  17.187500

[62 rows x 6 columns]