I have the following dataframe:
>>> df.head()
id date seizure_type note date_column
0 4 2021-10-01 16:35:00 Mild None 2021-10-01
1 5 2021-10-02 16:45:00 Medium Wet 2021-10-02
2 6 2021-10-02 22:55:00 Medium None 2021-10-02
3 7 2021-10-03 08:30:00 Strong None 2021-10-03
4 8 2021-10-06 19:30:00 Strong Been brewing all day 2021-10-06
I'm trying to count the value of each seizure_type per day and create a new column with this data. The following produces what I need:
data = df.groupby(['date_column'])['seizure_type'].count()
date_column
2021-10-01 1
2021-10-02 2
2021-10-03 1
2021-10-06 1
2021-10-07 1
But I do not know how to add this data as a new column to the existing dataframe.
I have tried to add the column like so:
df['freq'] = data
but the freq column does not work and produces this:
id date seizure_type note date_column freq
0 4 2021-10-01 16:35:00 Mild 2021-10-01 NaN
1 5 2021-10-02 16:45:00 Medium Wet 2021-10-02 NaN
2 6 2021-10-02 22:55:00 Medium 2021-10-02 NaN
3 7 2021-10-03 08:30:00 Strong 2021-10-03 NaN
4 8 2021-10-06 19:30:00 Strong Been brewing all day 2021-10-06 NaN
I feel the answer is staring me in the face but I can't see it.
CodePudding user response:
The answer given above by not_speshal wasn't in the end what I was looking for. I realised I was trying to turn a groupby object into a dataframe and found that the best way to do this is by using .to_frame and then to reset the index using .reset_index()
frequency_count = df.groupby(['date_column'])['seizure_type'].value_counts().to_frame(name='frequency').reset_index()
CodePudding user response:
Use transform
:
df["freq"] = df.groupby('date_column')['seizure_type'].transform("count")
>>> df
id date seizure_type note date_column freq
0 4 2021-10-01 16:35:00 Mild None 2021-10-01 1
1 5 2021-10-02 16:45:00 Medium Wet 2021-10-02 2
2 6 2021-10-02 22:55:00 Medium None 2021-10-02 2
3 7 2021-10-03 08:30:00 Strong None 2021-10-03 1
4 8 2021-10-06 19:30:00 Strong Been brewing all day 2021-10-06 1
As an aside, it looks like you have created the "date_column" column just to use in the groupby
. You don't need to do this. You could simply use:
df["freq"] = df.groupby(df["date"].dt.date)['seizure_type'].transform("count")
>>> df
id date seizure_type note freq
0 4 2021-10-01 16:35:00 Mild None 1
1 5 2021-10-02 16:45:00 Medium Wet 2
2 6 2021-10-02 22:55:00 Medium None 2
3 7 2021-10-03 08:30:00 Strong None 1
4 8 2021-10-06 19:30:00 Strong Been brewing all day 1