pandas: sum values by date in different dolumn-CodePudding

I have a data frame as follows:


data1 month day

20    1     1

10    1     1

15    1     2

12    1     2

16    1     3

10    1     3

20    2     1

10    2     1

15    2     2

10    2     2

12    2     3

10    2     3

I want to find the sum of data for each day of each month and display the result as a dataframe similar to the following:


date sum_data1

1.1. 30

2.1. 27

3.1. 26

1.2. 30

2.2. 25

3.2. 22

The data set is quite big > 200,000 rows.

CodePudding user response：

Because no column year first add it to month and day, pass to to_datetime and aggregate sum:

date = pd.to_datetime(df[['month','day']].assign(year=2022))

df = df.groupby(date.rename('date'))['data1'].sum().reset_index(name='sum_data1')
print (df)
        date  sum_data1
0 2022-01-01         30
1 2022-01-02         27
2 2022-01-03         26
3 2022-02-01         30
4 2022-02-02         25
5 2022-02-03         22

CodePudding user response：

df = df.groupby(['month', 'day']).sum().reset_index()
df['New'] = df.apply(lambda row:  float(f'{row.day}.{row.month}'), axis=1)
df

Output:

    month   day data1   New
0   1       1   30     1.1
1   1       2   27     2.1
2   1       3   26     3.1
3   2       1   30     1.2
4   2       2   25     2.2
5   2       3   22     3.2

CodePudding user response：

You can do something like this:

df = pd.DataFrame({
    'data1': [20, 10, 15, 12, 16, 10, 20, 10, 15, 10, 12, 10],
    'month': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
    'day': [1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3]
})
df['date'] = df['month'].astype('string')   '.'   df['day'].astype('string')

df.groupby(['date']).sum()[['data1']].rename(columns={'data1':'sum_data1'})

    sum_data1
date    
1.1 30
1.2 27
1.3 26
2.1 30
2.2 25
2.3 22

After that, you can use reset_index() if needed...