I want to combine multiple datestamps (datetime64) to a single row representing one day. Then I want to sum up the amount in the last column getting the total sales per day.
In this case I want to have two lines, with the two days and the total sales. I have tried to solve my problem with the groupby operation, but it won't work.
CodePudding user response:
You could try to use resample
df_1d=df.resample('1d', on='timestamp').sum()
It will sum all data for all day or from another time
CodePudding user response:
The one-liner df.resample('1d', on='timestamp').sum()
from Aeroxer Support is perfect, but it does not explain why your attempts with groupby
failed.
In order to groupby
to work, you would need a column with just the day in it. Then you could groupby
by that day column.
Below is the example code. I add the extra column with just the day in it at In [4]
and then df.groupby('day').sum()
is what you are looking for.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({
...: 'timestamp': map(pd.Timestamp, ['2022-09-30 11:21', '2022-09-30 20:55', '2022-10-01 10:35', '2022-10-01 22:42']),
...: 'sales': [99.90, 10.20, 5.99, 21.00]
...: })
In [3]: df
Out[3]:
timestamp sales
0 2022-09-30 11:21:00 99.90
1 2022-09-30 20:55:00 10.20
2 2022-10-01 10:35:00 5.99
3 2022-10-01 22:42:00 21.00
In [4]: df['day'] = df.timestamp.apply(pd.Timestamp.date)
In [5]: df
Out[5]:
timestamp sales day
0 2022-09-30 11:21:00 99.90 2022-09-30
1 2022-09-30 20:55:00 10.20 2022-09-30
2 2022-10-01 10:35:00 5.99 2022-10-01
3 2022-10-01 22:42:00 21.00 2022-10-01
In [6]: df.groupby('day').sum()
Out[6]:
sales
day
2022-09-30 110.10
2022-10-01 26.99