Home > Blockchain >  How to calculate Month to Date (MTD) and YTD using Pandas dataframe?
How to calculate Month to Date (MTD) and YTD using Pandas dataframe?

Time:12-19

I want to calculateMTD and YTD using pandas dataframe. For that, I wrote a code and I got a following error.

code:

import pandas as pd

data = {'date' : ['2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30', '2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30','2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30', '2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30','2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30'],
        'product': ['Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange'],
        'price': [10, 20, 10, 50, 10, 5, 10, 10, 20, 10, 5, 5, 10, 10, 20, 50, 10, 5, 20, 10, 10, 20, 50, 20, 5, 5, 10, 10, 20, 50, 30, 10, 20, 5, 5, 10, 20, 10, 20, 10, 40, 20, 10, 10, 20, 20, 10, 5]}


df = pd.DataFrame(data)

print("Dataframe-----------------------------------")
print(df)
print("Dataframe Ends------------------------------")

df.date = pd.to_datetime(df.date)
df = df.groupby('date', 'product').price.sum()
df = df.groupby(df.index.to_period('m')).cumsum().reset_index()

print("MTD Dataframe")
print(df)

Error:


Traceback (most recent call last):
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/generic.py", line 550, in _get_axis_number
return cls._AXIS_TO_AXIS_NUMBER[axis]
 KeyError: 'product'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ab/parry-data_processing/parry-analytics/poc.py", line 15, in <module>
df = df.groupby('date', 'product').price.sum()
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/frame.py", line 7713, in groupby
axis = self._get_axis_number(axis)
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/generic.py", line 552, in _get_axis_number
raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named product for object type DataFrame

Can anyone suggest a solution to solve this issue?

Expected MTD output:

          date product  price
0   2017/01/01   Apple     10
1   2017/01/02   Apple     30
2   2017/01/03   Apple     40
3   2017/01/04   Apple     90
4   2017/01/15   Apple     100
5   2017/01/20   Apple     105
6   2017/01/23   Apple     115
7   2017/01/30   Apple     125
8   2017/01/01  Orange     20
9   2017/01/02  Orange     30
10  2017/01/03  Orange     35
11  2017/01/04  Orange     40
12  2017/01/15  Orange     50
13  2017/01/20  Orange     60
14  2017/01/23  Orange     80
15  2017/01/30  Orange     130
16  2017/04/01   Apple     10
17  2017/04/02   Apple     15
18  2017/04/03   Apple     35
19  2017/04/04   Apple     45
20  2017/04/15   Apple     55
21  2017/04/20   Apple     75
22  2017/04/23   Apple     125
23  2017/04/30   Apple     145
24  2017/04/01  Orange      5
25  2017/04/02  Orange     10
26  2017/04/03  Orange     20
27  2017/04/04  Orange     30
28  2017/04/15  Orange     50
29  2017/04/20  Orange     100
30  2017/04/23  Orange     130
31  2017/04/30  Orange     140
32  2017/05/01   Apple     20
33  2017/05/02   Apple     25
34  2017/05/03   Apple     30
35  2017/05/04   Apple     40
36  2017/05/15   Apple     60
37  2017/05/20   Apple     70
38  2017/05/23   Apple     90
39  2017/05/30   Apple     100
40  2017/05/01  Orange     40
41  2017/05/02  Orange     60
42  2017/05/03  Orange     70
43  2017/05/04  Orange     80
44  2017/05/15  Orange     100
45  2017/05/20  Orange     120
46  2017/05/23  Orange     130
47  2017/05/30  Orange     135

Expected YTD output:

Same as above. But it should calculated from the starting of financial year (month of April) and product wise.

CodePudding user response:

Use:

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product']).price.cumsum()

#df['test'] = df.date.dt.to_period('A-MAR')
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product']).price.cumsum()
print(df.tail(20))
         date product  price  MTD  YTD
28 2017-04-15  Orange     20   50   50
29 2017-04-20  Orange     50  100  100
30 2017-04-23  Orange     30  130  130
31 2017-04-30  Orange     10  140  140
32 2017-05-01   Apple     20   20  165
33 2017-05-02   Apple      5   25  170
34 2017-05-03   Apple      5   30  175
35 2017-05-04   Apple     10   40  185
36 2017-05-15   Apple     20   60  205
37 2017-05-20   Apple     10   70  215
38 2017-05-23   Apple     20   90  235
39 2017-05-30   Apple     10  100  245
40 2017-05-01  Orange     40   40  180
41 2017-05-02  Orange     20   60  200
42 2017-05-03  Orange     10   70  210
43 2017-05-04  Orange     10   80  220
44 2017-05-15  Orange     20  100  240
45 2017-05-20  Orange     20  120  260
46 2017-05-23  Orange     10  130  270
47 2017-05-30  Orange      5  135  275

CodePudding user response:

The error you are seeing is likely due to an issue with the syntax of the groupby function.

In the line df = df.groupby('date', 'product').price.sum(), the groupby function takes two arguments: 'date' and 'product'. However, these arguments should be passed as a list of strings, like this:

 df = df.groupby(['date', 'product']).price.sum().

Also change the line

df = df.groupby(df.index.to_period('m')).cumsum().reset_index() to df['price'] = df['price'].groupby(df.index.to_period('m')).cumsum().reset_index().
  • Related