Pandas resample by day and count occurrences to new column-CodePudding

So I went through a lot of workarounds about resampling and groupby methods in pandas and could not find the solution of my problem.

I have dataset with time and error type. I would like to count a specific type of error that occurred per day but there is more than one type in one day.

Example of my data:

               time  error  
0 2021-10-03 19:18:20   2222     
1 2021-10-03 19:39:36   2222     
2 2021-10-03 19:48:03   3333    
3 2021-10-03 19:48:23   4444     
4 2021-10-03 19:48:36   3333

So I am trying the following:

df = df.groupby(['time','error'], as_index=False).size()

And after that resample the data by day and sum the occurrence number.

Unfortunately without any success.

Desired output is:

               time  error  size
0          2021-10-03   2222   2
1          2021-10-03   3333   2
2          2021-10-03   4444   1

And so on for every error type and day.

Thanks for any help!

CodePudding user response：

Let's try

df['time'] = pd.to_datetime(df['time'])
out = df.groupby([df['time'].dt.date,'error'], as_index=False).size()

print(out)

         time  error  size
0  2021-10-03   2222     2
1  2021-10-03   3333     2
2  2021-10-03   4444     1

CodePudding user response：

You have the time included in the date, so each grouby will be for a unique date and time. So what you need to do is isolate only the date part, then can groupby that date (excluding the time).

import pandas as pd

columns = ['time', 'error']  
data = [['2021-10-03 19:18:20',   '2222'],   
['2021-10-03 19:39:36',   '2222'],   
['2021-10-03 19:48:03',   '3333'],   
['2021-10-03 19:48:23',   '4444'],     
['2021-10-03 19:48:36',   '3333']]


df = pd.DataFrame(data=data, columns=columns)
df['date'] = pd.to_datetime(df['time']).dt.date


df = df.groupby(['date','error'], as_index=False).size()

Output:

print(df)
         date error  size
0  2021-10-03  2222     2
1  2021-10-03  3333     2
2  2021-10-03  4444     1