Converting a data frame of events into a timetable format-CodePudding

I am working on converting a list of online classes into a heat map using Python & Pandas and I've come to a dead end. Right now, I have a data frame 'data' with some events containing a day of the week listed as 'DAY' and the time of the event in hours listed as 'TIME'. The dataset is displayed as follows:

ID   TIME        DAY
                  
108    15   Saturday
110    15     Sunday
112    16  Wednesday
114    16     Friday
116    15     Monday
..    ...        ...
639    12  Wednesday
640    12   Saturday
641    18   Saturday
642    16   Thursday
643    15     Friday

I'm looking for a way to sum repetitions of every 'TIME' value for every 'DAY' and then present these sums in a new table 'event_count'. I need to turn the linear data in my 'data' table into a more timetable-like form that can later be converted into a visual heatmap.

Sounds like a difficult transformation, but I feel like I'm missing something very obvious.

TIME   Monday   Tuesday   Wednesday   Thursday   Friday   Saturday   Sunday
                  
10       5         2          4           6         1        0         2
11       4         2          4           6         1        0         2
12       6         2          4           6         1        0         2
13       3         2          4           6         1        0         2
14       7         2          4           6         1        0         2

I tried to achieve this through pivot_table and stack, however, the best I got was a list of all days of the week with mean averages for time. Could you advise me which direction should I look into and how can I approach solving this?

CodePudding user response：

IIUC you can do something like this:

df is from your given example data.

import pandas as pd

df = pd.DataFrame({
    'ID': [108, 110, 112, 114, 116, 639, 640, 641, 642, 643],
    'TIME': [15, 15, 16, 16, 15, 12, 12, 18, 16, 15],
    'DAY': ['Saturday','Sunday','Wednesday','Friday','Monday','Wednesday','Saturday','Saturday','Thursday','Friday']
    })

weekdays = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
out = (pd.crosstab(index=df['TIME'], columns=df['DAY'], values=df['TIME'],aggfunc='count')
       .sort_index(axis=0) #sort by the index 'TIME'
       .reindex(weekdays, axis=1) # sort columns in order of the weekdays
       .rename_axis(None, axis=1) # delete name of index
       .reset_index() # 'TIME' from index to column
      )

print(out)

   TIME  Monday  Tuesday  Wednesday  Thursday  Friday  Saturday  Sunday
0    12     NaN      NaN        1.0       NaN     NaN       1.0     NaN
1    15     1.0      NaN        NaN       NaN     1.0       1.0     1.0
2    16     NaN      NaN        1.0       1.0     1.0       NaN     NaN
3    18     NaN      NaN        NaN       NaN     NaN       1.0     NaN

You were also in the right path with pivot_table. I'm not sure what was missing to get you the right result but here is one approach with it. I added `margins, maybe it is also interesting for you to get the total ammount of each index/column.

weekdays = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'Total']
out2 = (pd.pivot_table(data=df, index='TIME', columns='DAY', aggfunc='count', margins=True, margins_name='Total')
        .droplevel(0,axis=1)
        .reindex(weekdays, axis=1)
       )
print(out2)

DAY    Monday  Tuesday  Wednesday  Thursday  Friday  Saturday  Sunday  Total
TIME                                                                        
12        NaN      NaN        1.0       NaN     NaN       1.0     NaN      2
15        1.0      NaN        NaN       NaN     1.0       1.0     1.0      4
16        NaN      NaN        1.0       1.0     1.0       NaN     NaN      3
18        NaN      NaN        NaN       NaN     NaN       1.0     NaN      1
Total     1.0      NaN        2.0       1.0     2.0       3.0     1.0     10