Home > Software design >  How to get a new column count number of time appear of a date
How to get a new column count number of time appear of a date

Time:05-27

I have this dataframe with this datatypes

    Date        Time
0   2022-05-20  17:07:00
1   2022-05-20  09:14:00
2   2022-05-19  18:56:00
3   2022-05-19  13:53:00
4   2022-05-19  13:52:00
... ... ...
81  2022-04-22  09:53:00
82  2022-04-20  18:20:00
83  2022-04-20  12:53:00
84  2022-04-20  12:12:00
85  2022-04-20  09:50:00
86 rows × 2 columns
Date    datetime64[ns]
Time            object
dtype: object

I tried df1 = df[['Date','Time']].groupby(['Date']).agg(['count']) and got

           Time
Date       count    
2022-04-20  4
2022-04-22  4
2022-04-25  3
2022-04-26  6
2022-04-27  4
2022-04-28  4
2022-04-29  4
2022-05-02  4
2022-05-03  4
2022-05-04  4

Time also disappear when I tried df = df.groupby(['Date'])['Date'].count().reset_index(name='Counts')

0   2022-04-20  4
1   2022-04-22  4
2   2022-04-25  2
3   2022-04-26  6
4   2022-04-27  4

So the Time column just gone. How do I get a dataframe where Date will be index, Time in that date, counts number of occurrence of that date? My project is to find the difference in Time within a date if number of date is odd. For example, if there are 4 time entries on 5/19/2020, then I need to find differences between entry 1 and entry 2, then entry 3 and entry 4, sum the above to get final result. I don't know if there is more elegant way to do it other than dataframe.

CodePudding user response:

you can merge the count by dates to the original DF. Does that help?

df2=df.groupby(['Date'])['Date'].count().reset_index(name='count')
df3=df.merge(df2,
            on='Date', how='left')
df3.set_index('Date', inplace=True)
df3
                Time  count
Date                       
2022-05-20  17:07:00      2
2022-05-20  09:14:00      2
2022-05-19  18:56:00      3
2022-05-19  13:53:00      3
2022-05-19  13:52:00      3
2022-04-22  09:53:00      1
2022-04-20  18:20:00      4
2022-04-20  12:53:00      4
2022-04-20  12:12:00      4
2022-04-20  09:50:00      4

To make date appear only once, here it is

df2=df.groupby(['Date'])['Date'].count().reset_index(name='count')
df3=df.merge(df2, on='Date', how='left')
df3=df3.reset_index()
df3['index'] = 'col'  # it is added to make use of pd.pivot below, a workaround
df3.pivot(index=['Date','Time','count'], columns='index')
Date        Time    count
2022-04-20  09:50:00    4
            12:12:00    4
            12:53:00    4
            18:20:00    4
2022-04-22  09:53:00    1
2022-05-19  13:52:00    3
            13:53:00    3
            18:56:00    3
2022-05-20  09:14:00    2

CodePudding user response:

You can use nunique:

df['count'] = df.groupby('Date').transform('nunique')
print(df)

# Output
         Date            Time  count
0  2022-05-20 0 days 17:07:00      2
1  2022-05-20 0 days 09:14:00      2
2  2022-05-19 0 days 18:56:00      3
3  2022-05-19 0 days 13:53:00      3
4  2022-05-19 0 days 13:52:00      3
81 2022-04-22 0 days 09:53:00      1
82 2022-04-20 0 days 18:20:00      4
83 2022-04-20 0 days 12:53:00      4
84 2022-04-20 0 days 12:12:00      4
85 2022-04-20 0 days 09:50:00      4
  • Related