Home > Software engineering >  Adding a new column to a pandas dataframe based on data organised by groupby
Adding a new column to a pandas dataframe based on data organised by groupby

Time:11-07

I have the following dataframe:

>>> df.head()
   id                date seizure_type                  note date_column
0   4 2021-10-01 16:35:00         Mild                  None  2021-10-01
1   5 2021-10-02 16:45:00       Medium                   Wet  2021-10-02
2   6 2021-10-02 22:55:00       Medium                  None  2021-10-02
3   7 2021-10-03 08:30:00       Strong                  None  2021-10-03
4   8 2021-10-06 19:30:00       Strong  Been brewing all day  2021-10-06

I'm trying to count the value of each seizure_type per day and create a new column with this data. The following produces what I need:

data = df.groupby(['date_column'])['seizure_type'].count()

date_column
2021-10-01    1
2021-10-02    2
2021-10-03    1
2021-10-06    1
2021-10-07    1

But I do not know how to add this data as a new column to the existing dataframe.

I have tried to add the column like so:

df['freq'] = data

but the freq column does not work and produces this:

   id                date seizure_type                  note date_column  freq
0   4 2021-10-01 16:35:00         Mild                        2021-10-01   NaN
1   5 2021-10-02 16:45:00       Medium                   Wet  2021-10-02   NaN
2   6 2021-10-02 22:55:00       Medium                        2021-10-02   NaN
3   7 2021-10-03 08:30:00       Strong                        2021-10-03   NaN
4   8 2021-10-06 19:30:00       Strong  Been brewing all day  2021-10-06   NaN

I feel the answer is staring me in the face but I can't see it.

CodePudding user response:

The answer given above by not_speshal wasn't in the end what I was looking for. I realised I was trying to turn a groupby object into a dataframe and found that the best way to do this is by using .to_frame and then to reset the index using .reset_index()

frequency_count = df.groupby(['date_column'])['seizure_type'].value_counts().to_frame(name='frequency').reset_index()

CodePudding user response:

Use transform:

df["freq"] = df.groupby('date_column')['seizure_type'].transform("count")

>>> df
   id                date seizure_type                  note date_column  freq
0   4 2021-10-01 16:35:00         Mild                  None  2021-10-01     1
1   5 2021-10-02 16:45:00       Medium                   Wet  2021-10-02     2
2   6 2021-10-02 22:55:00       Medium                  None  2021-10-02     2
3   7 2021-10-03 08:30:00       Strong                  None  2021-10-03     1
4   8 2021-10-06 19:30:00       Strong  Been brewing all day  2021-10-06     1

As an aside, it looks like you have created the "date_column" column just to use in the groupby. You don't need to do this. You could simply use:

df["freq"] = df.groupby(df["date"].dt.date)['seizure_type'].transform("count")

>>> df
   id                date seizure_type                  note  freq
0   4 2021-10-01 16:35:00         Mild                  None     1
1   5 2021-10-02 16:45:00       Medium                   Wet     2
2   6 2021-10-02 22:55:00       Medium                  None     2
3   7 2021-10-03 08:30:00       Strong                  None     1
4   8 2021-10-06 19:30:00       Strong  Been brewing all day     1
  • Related