I have a data frame as follows:
data1 month day
20 1 1
10 1 1
15 1 2
12 1 2
16 1 3
10 1 3
20 2 1
10 2 1
15 2 2
10 2 2
12 2 3
10 2 3
I want to find the sum of data for each day of each month and display the result as a dataframe similar to the following:
date sum_data1
1.1. 30
2.1. 27
3.1. 26
1.2. 30
2.2. 25
3.2. 22
The data set is quite big > 200,000 rows.
CodePudding user response:
Because no column year
first add it to month
and day
, pass to to_datetime
and aggregate sum
:
date = pd.to_datetime(df[['month','day']].assign(year=2022))
df = df.groupby(date.rename('date'))['data1'].sum().reset_index(name='sum_data1')
print (df)
date sum_data1
0 2022-01-01 30
1 2022-01-02 27
2 2022-01-03 26
3 2022-02-01 30
4 2022-02-02 25
5 2022-02-03 22
CodePudding user response:
df = df.groupby(['month', 'day']).sum().reset_index()
df['New'] = df.apply(lambda row: float(f'{row.day}.{row.month}'), axis=1)
df
Output:
month day data1 New
0 1 1 30 1.1
1 1 2 27 2.1
2 1 3 26 3.1
3 2 1 30 1.2
4 2 2 25 2.2
5 2 3 22 3.2
CodePudding user response:
You can do something like this:
df = pd.DataFrame({
'data1': [20, 10, 15, 12, 16, 10, 20, 10, 15, 10, 12, 10],
'month': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
'day': [1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3]
})
df['date'] = df['month'].astype('string') '.' df['day'].astype('string')
df.groupby(['date']).sum()[['data1']].rename(columns={'data1':'sum_data1'})
sum_data1
date
1.1 30
1.2 27
1.3 26
2.1 30
2.2 25
2.3 22
After that, you can use reset_index() if needed...