I want to calculateMTD
and YTD
using pandas
dataframe. For that, I wrote a code and I got a following error.
code:
import pandas as pd
data = {'date' : ['2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30', '2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30','2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30', '2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30','2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30'],
'product': ['Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange'],
'price': [10, 20, 10, 50, 10, 5, 10, 10, 20, 10, 5, 5, 10, 10, 20, 50, 10, 5, 20, 10, 10, 20, 50, 20, 5, 5, 10, 10, 20, 50, 30, 10, 20, 5, 5, 10, 20, 10, 20, 10, 40, 20, 10, 10, 20, 20, 10, 5]}
df = pd.DataFrame(data)
print("Dataframe-----------------------------------")
print(df)
print("Dataframe Ends------------------------------")
df.date = pd.to_datetime(df.date)
df = df.groupby('date', 'product').price.sum()
df = df.groupby(df.index.to_period('m')).cumsum().reset_index()
print("MTD Dataframe")
print(df)
Error:
Traceback (most recent call last):
File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/generic.py", line 550, in _get_axis_number
return cls._AXIS_TO_AXIS_NUMBER[axis]
KeyError: 'product'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ab/parry-data_processing/parry-analytics/poc.py", line 15, in <module>
df = df.groupby('date', 'product').price.sum()
File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/frame.py", line 7713, in groupby
axis = self._get_axis_number(axis)
File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/generic.py", line 552, in _get_axis_number
raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named product for object type DataFrame
Can anyone suggest a solution to solve this issue?
Expected MTD output:
date product price
0 2017/01/01 Apple 10
1 2017/01/02 Apple 30
2 2017/01/03 Apple 40
3 2017/01/04 Apple 90
4 2017/01/15 Apple 100
5 2017/01/20 Apple 105
6 2017/01/23 Apple 115
7 2017/01/30 Apple 125
8 2017/01/01 Orange 20
9 2017/01/02 Orange 30
10 2017/01/03 Orange 35
11 2017/01/04 Orange 40
12 2017/01/15 Orange 50
13 2017/01/20 Orange 60
14 2017/01/23 Orange 80
15 2017/01/30 Orange 130
16 2017/04/01 Apple 10
17 2017/04/02 Apple 15
18 2017/04/03 Apple 35
19 2017/04/04 Apple 45
20 2017/04/15 Apple 55
21 2017/04/20 Apple 75
22 2017/04/23 Apple 125
23 2017/04/30 Apple 145
24 2017/04/01 Orange 5
25 2017/04/02 Orange 10
26 2017/04/03 Orange 20
27 2017/04/04 Orange 30
28 2017/04/15 Orange 50
29 2017/04/20 Orange 100
30 2017/04/23 Orange 130
31 2017/04/30 Orange 140
32 2017/05/01 Apple 20
33 2017/05/02 Apple 25
34 2017/05/03 Apple 30
35 2017/05/04 Apple 40
36 2017/05/15 Apple 60
37 2017/05/20 Apple 70
38 2017/05/23 Apple 90
39 2017/05/30 Apple 100
40 2017/05/01 Orange 40
41 2017/05/02 Orange 60
42 2017/05/03 Orange 70
43 2017/05/04 Orange 80
44 2017/05/15 Orange 100
45 2017/05/20 Orange 120
46 2017/05/23 Orange 130
47 2017/05/30 Orange 135
Expected YTD output:
Same as above. But it should calculated from the starting of financial year (month of April) and product wise.
CodePudding user response:
Use:
df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product']).price.cumsum()
#df['test'] = df.date.dt.to_period('A-MAR')
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product']).price.cumsum()
print(df.tail(20))
date product price MTD YTD
28 2017-04-15 Orange 20 50 50
29 2017-04-20 Orange 50 100 100
30 2017-04-23 Orange 30 130 130
31 2017-04-30 Orange 10 140 140
32 2017-05-01 Apple 20 20 165
33 2017-05-02 Apple 5 25 170
34 2017-05-03 Apple 5 30 175
35 2017-05-04 Apple 10 40 185
36 2017-05-15 Apple 20 60 205
37 2017-05-20 Apple 10 70 215
38 2017-05-23 Apple 20 90 235
39 2017-05-30 Apple 10 100 245
40 2017-05-01 Orange 40 40 180
41 2017-05-02 Orange 20 60 200
42 2017-05-03 Orange 10 70 210
43 2017-05-04 Orange 10 80 220
44 2017-05-15 Orange 20 100 240
45 2017-05-20 Orange 20 120 260
46 2017-05-23 Orange 10 130 270
47 2017-05-30 Orange 5 135 275
CodePudding user response:
The error you are seeing is likely due to an issue with the syntax of the groupby function.
In the line df = df.groupby('date', 'product').price.sum()
, the groupby function takes two arguments: 'date' and 'product'. However, these arguments should be passed as a list of strings, like this:
df = df.groupby(['date', 'product']).price.sum().
Also change the line
df = df.groupby(df.index.to_period('m')).cumsum().reset_index() to df['price'] = df['price'].groupby(df.index.to_period('m')).cumsum().reset_index().