Home > Software design >  convert the dataframe into time intervals from timestamps
convert the dataframe into time intervals from timestamps

Time:12-19

my dataframe looks something as follows

user   action created_at
user1   A     2021-12-15 11:58:53.217258
user1   A     2021-12-15 11:50:27.864876
user1   B     2021-12-15 14:23:27.864876
user1   A     2021-12-15 14:23:27.864876
user3   C     2021-12-15 12:48:09.044315
user3   D     2021-12-15 12:48:09.044315
user4   A     2021-12-15 14:48:09.044315
   

i want something as following. where total action is the frequency of of a action

                            11-12   12-13 ... 14-15
user   total_action  action 
user1    3             A       2        0        1
user1    1             B       0        0        1
user3    1             C       0        1        0
user4    1             D       0        1        0
user5    1             A       0        0        1

here 11-12, 12-13 are the time intervals(could be in string format, no issues there). I have tried looking at pd.Grouper but that also isn't helping much.

CodePudding user response:

This is pd.crosstab:

(pd.crosstab([df.user, df.action],df.created_at.dt.hour, 
                  margins=True, margins_name='total_action')
     .iloc[:-1] # remove column total
)

Output:

created_at    11  12  14  total_action
user  action                          
user1 A        2   0   1             3
      B        0   0   1             1
user3 C        0   1   0             1
      D        0   1   0             1
user4 A        0   0   1             1

Or you can use groupby with grouper like:

(df.groupby(['user','action', pd.Grouper(key='created_at', freq='H')])
      .size().unstack(fill_value=0)
      .reset_index()
     )

Output:

created_at   user action  2021-12-15 11:00:00  2021-12-15 14:00:00  2021-12-15 12:00:00
0           user1      A                    2                    1                    0
1           user1      B                    0                    1                    0
2           user3      C                    0                    0                    1
3           user3      D                    0                    0                    1
4           user4      A                    0                    1                    0

and you can add total_action column.

  • Related