Home > Software engineering >  How to calculate Month to Date (MTD) and YTD as cumulative average using Pandas dataframe?
How to calculate Month to Date (MTD) and YTD as cumulative average using Pandas dataframe?

Time:01-02

I want to calculative MTD and YTD as cumulative average using pandas dataframe.I wrote below code to generate the output.

code:

import pandas as pd

#MTD AND YTD calculation

data = {'date' : ['2016/11/01', '2016/11/02', '2016/11/03', '2016/11/04', '2016/11/05', '2016/11/05', '2016/12/01', '2016/12/02', '2016/12/03', '2016/12/04', '2016/12/15', '2016/12/20', '2016/12/23', '2016/12/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30', '2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30','2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30', '2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30','2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30'],
        'category': ['fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit', 'fruit',
                    'fruit', 'fruit', 'fruit', 'fruit', 'fruit'],

        'product': ['grapes', 'grapes', 'grapes', 'kiwi', 'kiwi', 'grapes', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange'],
        'price': [10, 10, 20, 40, 60, 30, 10, 20, 10, 50, 10, 5, 10, 10, 10, 20, 10, 50, 10, 5, 10, 10, 20, 10, 5, 5, 10, 10, 20, 50, 10, 5, 20, 10, 10, 20, 50, 20, 5, 5, 10, 10, 20, 50, 30, 10, 20, 5, 5, 10, 20, 10, 20, 10, 40, 20, 10, 10, 20, 20, 10, 5]}


df = pd.DataFrame(data)

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean()

print(df)

But I got a error

Error:

Traceback (most recent call last):
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 11000, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/series.py", line 4672, in reindex
    return super().reindex(**kwargs)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/generic.py", line 4981, in _reindex_axes
    new_index, indexer = ax.reindex(
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 4223, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ab/parry-data_processing/pa/poc.py", line 21, in <module>
    df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3655, in __setitem__
    self._set_item(key, value)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3832, in _set_item
    value = self._sanitize_column(value)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 4532, in _sanitize_column
    return _reindex_for_setitem(value, self.index)
  File "/home/ab/PycharmProjects/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 11007, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

Can anyone suggest a solution to find the cumulative average for MTD and YTD?

CodePudding user response:

Use Series.droplevel for remove first 3 levels in MultiIndex:

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean().droplevel([0,1,2])
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean().droplevel([0,1,2])

print(df)

         date category product  price        MTD        YTD
0  2016-11-01    fruit  grapes     10  10.000000  10.000000
1  2016-11-02    fruit  grapes     10  10.000000  10.000000
2  2016-11-03    fruit  grapes     20  13.333333  13.333333
3  2016-11-04    fruit    kiwi     40  40.000000  40.000000
4  2016-11-05    fruit    kiwi     60  50.000000  50.000000
..        ...      ...     ...    ...        ...        ...
57 2017-05-04    fruit  Orange     10  20.000000  18.333333
58 2017-05-15    fruit  Orange     20  20.000000  18.461538
59 2017-05-20    fruit  Orange     20  20.000000  18.571429
60 2017-05-23    fruit  Orange     10  18.571429  18.000000
61 2017-05-30    fruit  Orange      5  16.875000  17.187500

[62 rows x 6 columns]

If need dynamic solution for remove all levels without last is possible use MultiIndex.nlevels for gen number of levels and subtract 1 for keep last level:

s1 = df.groupby([df.date.dt.to_period('m'),'product', 'category']).price.expanding().mean()
s2 = df.groupby([df.date.dt.to_period('A-MAR'),'product', 'category']).price.expanding().mean()
df['MTD'] = s1.droplevel(list(range(s1.index.nlevels - 1)))
df['YTD'] = s2.droplevel(list(range(s1.index.nlevels - 1)))

print(df)

         date category product  price        MTD        YTD
0  2016-11-01    fruit  grapes     10  10.000000  10.000000
1  2016-11-02    fruit  grapes     10  10.000000  10.000000
2  2016-11-03    fruit  grapes     20  13.333333  13.333333
3  2016-11-04    fruit    kiwi     40  40.000000  40.000000
4  2016-11-05    fruit    kiwi     60  50.000000  50.000000
..        ...      ...     ...    ...        ...        ...
57 2017-05-04    fruit  Orange     10  20.000000  18.333333
58 2017-05-15    fruit  Orange     20  20.000000  18.461538
59 2017-05-20    fruit  Orange     20  20.000000  18.571429
60 2017-05-23    fruit  Orange     10  18.571429  18.000000
61 2017-05-30    fruit  Orange      5  16.875000  17.187500

[62 rows x 6 columns]
  • Related