So I went through a lot of workarounds about resampling and groupby methods in pandas and could not find the solution of my problem.
I have dataset with time and error type. I would like to count a specific type of error that occurred per day but there is more than one type in one day.
Example of my data:
time error
0 2021-10-03 19:18:20 2222
1 2021-10-03 19:39:36 2222
2 2021-10-03 19:48:03 3333
3 2021-10-03 19:48:23 4444
4 2021-10-03 19:48:36 3333
So I am trying the following:
df = df.groupby(['time','error'], as_index=False).size()
And after that resample the data by day and sum the occurrence number.
Unfortunately without any success.
Desired output is:
time error size
0 2021-10-03 2222 2
1 2021-10-03 3333 2
2 2021-10-03 4444 1
And so on for every error type and day.
Thanks for any help!
CodePudding user response:
Let's try
df['time'] = pd.to_datetime(df['time'])
out = df.groupby([df['time'].dt.date,'error'], as_index=False).size()
print(out)
time error size
0 2021-10-03 2222 2
1 2021-10-03 3333 2
2 2021-10-03 4444 1
CodePudding user response:
You have the time included in the date, so each grouby will be for a unique date and time. So what you need to do is isolate only the date part, then can groupby that date (excluding the time).
import pandas as pd
columns = ['time', 'error']
data = [['2021-10-03 19:18:20', '2222'],
['2021-10-03 19:39:36', '2222'],
['2021-10-03 19:48:03', '3333'],
['2021-10-03 19:48:23', '4444'],
['2021-10-03 19:48:36', '3333']]
df = pd.DataFrame(data=data, columns=columns)
df['date'] = pd.to_datetime(df['time']).dt.date
df = df.groupby(['date','error'], as_index=False).size()
Output:
print(df)
date error size
0 2021-10-03 2222 2
1 2021-10-03 3333 2
2 2021-10-03 4444 1