I have a DataFrame df as follows:
|size | date | name | type | revenue |
|10 | 13/12/2021 | A | Standard | 0,2 |
|248743 | 15/12/2021 | A | Standard | 0,2 |
|234 | 03/12/2022 | A | Basic | 0,1 |
|8734684 | 31/03/2022 | B | Basic | 0,1 |
|3589749 | 01/04/2021 | C | Basic | 0,4 |
|3356943 | 02/04/2021 | A | Basic | 0,1 |
|6908746 | 21/04/2021 | A | Basic | 0,1 |
|2375940 | 21/02/2022 | D | Premium | 0,7 |
|9387295 | 21/02/2022 | D | Premium | 0,7 |
|286432 | 21/02/2022 | D | Premium | 0,7 |
|192 | 31/03/2022 | D | Premium | 0,7 |
|486 | 18/02/2022 | E | Standard | 0,9 |
|23847 | 24/10/2021 | F | Basic | 0,3 |
|82346 | 12/11/2021 | B | Premium | 0,5 |
|28352 | 03/01/2022 | A | Basic | 0,1 |
I need to group by month with the size sum for rows which name and type are the same:
|size | date | name | type | revenue |
|28352 | Jan | A | Basic | 0,1 |
|486 | Feb | E | Standard | 0,9 |
|12049667| Feb | D | Premium | 0,7 |
|192 | Mar | D | Premium | 0,7 |
|8734684 | Mar | B | Basic | 0,1 |
|3589749 | Apr | C | Basic | 0,4 |
|10265689| Apr | A | Basic | 0,1 |
|23847 | Oct | F | Basic | 0,3 |
|82346 | Nov | B | Premium | 0,5 |
|248753 | Dec | A | Standard | 0,2 |
|234 | Dec | A | Basic | 0,1 |
I tried this code but it did not work:
df['date'] = pd.to_datetime(df['date'])
df1 = df.groupby(df['date'].dt.strftime('%B'))['size'].sum()
df2 = df1.groupby(['date', 'name', 'type', 'revenue'],as_index=False).sum()
How can I do it?
CodePudding user response:
IIUC, you need a single groupby
. You need to rework your "revenue" column as numeric.
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
group = df['date'].dt.strftime('%b')
(df.assign(revenue=pd.to_numeric(df['revenue'].str.replace(',', '.')))
.groupby([group, 'name', 'type'])
.agg('sum')
.reset_index()
)
Output:
date name type size revenue
0 Apr A Basic 6908746 0.1
1 Dec A Standard 248753 0.4
2 Dec B Premium 82346 0.5
3 Feb A Basic 3356943 0.1
4 Feb D Premium 12049667 2.1
5 Feb E Standard 486 0.9
6 Jan C Basic 3589749 0.4
7 Mar A Basic 28586 0.2
8 Mar B Basic 8734684 0.1
9 Mar D Premium 192 0.7
10 Oct F Basic 23847 0.3
Note that the above is aggregating months of different years into the same group. If you want to keep years separate, use a period:
group = df['date'].dt.to_period('M')
Output:
date name type size revenue
0 2021-01 C Basic 3589749 0.4
1 2021-02 A Basic 3356943 0.1
2 2021-04 A Basic 6908746 0.1
3 2021-10 F Basic 23847 0.3
4 2021-12 A Standard 248753 0.4
5 2021-12 B Premium 82346 0.5
6 2022-02 D Premium 12049667 2.1
7 2022-02 E Standard 486 0.9
8 2022-03 A Basic 28586 0.2
9 2022-03 B Basic 8734684 0.1
10 2022-03 D Premium 192 0.7