Below is the data in which I am facing the issue:
dict_1 = {'id' : [101, 101, 101, 101, 101, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 104, 104, 104,
104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105],
'electronics' : ['Mobile', "Laptop", "Laptop", "Laptop", "TV", 'Mobile', "Laptop", 'Mobile', "TV",
'Mobile', "Laptop", 'Mobile', "Laptop", "Mobile", "TV", 'Mobile', "Laptop", "TV",
'Mobile', "Laptop", "TV",'Mobile', "Laptop", "TV", 'Mobile', "Laptop", "TV", "Laptop", "TV",],
'date' : ['2022-05-30', '2022-05-30', '2022-05-30', '2022-05-30', '2022-05-30', '2022-05-31', '2022-05-31',
'2022-05-31', '2022-05-31', '2022-06-01', '2022-06-01', '2022-06-01', '2022-06-01', '2022-06-01',
'2022-06-01', '2022-06-02', '2022-06-02', '2022-06-02', '2022-06-02', '2022-06-02', '2022-06-02',
'2022-06-02', '2022-06-02', '2022-06-02', '2022-06-03', '2022-06-03', '2022-06-03','2022-06-03', '2022-06-03']}
test = pd.DataFrame(dict_1)
test.head(7)
id | electronics | date |
---|---|---|
101 | Mobile | 2022-05-30 |
101 | Laptop | 2022-05-30 |
101 | Laptop | 2022-05-30 |
101 | Laptop | 2022-05-30 |
101 | TV | 2022-05-30 |
102 | Mobile | 2022-05-31 |
102 | Laptop | 2022-05-31 |
I need to find Cumulative sum of ID(Count) based on Month i.e. if the month end then it should start with 0.
I have used below code to find cumulative sum of the above data
grp_by = test.groupby(['electronics', 'date'])['id'].count().groupby(level=0).cumsum().reset_index()
grp_by.head()
electronics | date | id |
---|---|---|
Laptop | 2022-05-30 | 3 |
Laptop | 2022-05-31 | 4 |
Laptop | 2022-06-01 | 6 |
Laptop | 2022-06-02 | 9 |
Laptop | 2022-06-03 | 11 |
Till date = 2022-05-31 data is working correctly however for 2022-06-01 value should be 2(because new month and there are 2 Laptop) and for 2022-06-02 value should be 5 and so on.
CodePudding user response:
Let us try two groupby's
s = test.groupby(['electronics', 'date'], as_index=False).size()
s['size'] = s.groupby(['electronics', s['date'].str[:7]]).cumsum()
print(s)
electronics date size
0 Laptop 2022-05-30 3
1 Laptop 2022-05-31 4
2 Laptop 2022-06-01 2
3 Laptop 2022-06-02 5
4 Laptop 2022-06-03 7
5 Mobile 2022-05-30 1
6 Mobile 2022-05-31 3
7 Mobile 2022-06-01 3
8 Mobile 2022-06-02 6
9 Mobile 2022-06-03 7
10 TV 2022-05-30 1
11 TV 2022-05-31 2
12 TV 2022-06-01 1
13 TV 2022-06-02 4
14 TV 2022-06-03 6
CodePudding user response:
you can create a month column and then include in the groupby. you can always drop the month column afterwards
test['yearmonth'] = test['date'].astype('datetime64').dt.strftime('%Y%m')
test.groupby(['electronics', 'yearmonth', 'date'])['id'].count().groupby(level=1).cumsum().reset_index()
electronics month date id
0 Laptop 202205 2022-05-30 3
1 Laptop 202205 2022-05-31 4
2 Laptop 202206 2022-06-01 2
3 Laptop 202206 2022-06-02 5
4 Laptop 202206 2022-06-03 7
thanks you @creanion, you have a valid point, i updated it