I am working on converting a list of online classes into a heat map using Python & Pandas and I've come to a dead end. Right now, I have a data frame 'data' with some events containing a day of the week listed as 'DAY' and the time of the event in hours listed as 'TIME'. The dataset is displayed as follows:
ID TIME DAY
108 15 Saturday
110 15 Sunday
112 16 Wednesday
114 16 Friday
116 15 Monday
.. ... ...
639 12 Wednesday
640 12 Saturday
641 18 Saturday
642 16 Thursday
643 15 Friday
I'm looking for a way to sum repetitions of every 'TIME' value for every 'DAY' and then present these sums in a new table 'event_count'. I need to turn the linear data in my 'data' table into a more timetable-like form that can later be converted into a visual heatmap.
Sounds like a difficult transformation, but I feel like I'm missing something very obvious.
TIME Monday Tuesday Wednesday Thursday Friday Saturday Sunday
10 5 2 4 6 1 0 2
11 4 2 4 6 1 0 2
12 6 2 4 6 1 0 2
13 3 2 4 6 1 0 2
14 7 2 4 6 1 0 2
I tried to achieve this through pivot_table and stack, however, the best I got was a list of all days of the week with mean averages for time. Could you advise me which direction should I look into and how can I approach solving this?
CodePudding user response:
IIUC you can do something like this:
df
is from your given example data.
import pandas as pd
df = pd.DataFrame({
'ID': [108, 110, 112, 114, 116, 639, 640, 641, 642, 643],
'TIME': [15, 15, 16, 16, 15, 12, 12, 18, 16, 15],
'DAY': ['Saturday','Sunday','Wednesday','Friday','Monday','Wednesday','Saturday','Saturday','Thursday','Friday']
})
weekdays = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
out = (pd.crosstab(index=df['TIME'], columns=df['DAY'], values=df['TIME'],aggfunc='count')
.sort_index(axis=0) #sort by the index 'TIME'
.reindex(weekdays, axis=1) # sort columns in order of the weekdays
.rename_axis(None, axis=1) # delete name of index
.reset_index() # 'TIME' from index to column
)
print(out)
TIME Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 12 NaN NaN 1.0 NaN NaN 1.0 NaN
1 15 1.0 NaN NaN NaN 1.0 1.0 1.0
2 16 NaN NaN 1.0 1.0 1.0 NaN NaN
3 18 NaN NaN NaN NaN NaN 1.0 NaN
You were also in the right path with pivot_table
. I'm not sure what was missing to get you the right result but here is one approach with it. I added `margins, maybe it is also interesting for you to get the total ammount of each index/column.
weekdays = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'Total']
out2 = (pd.pivot_table(data=df, index='TIME', columns='DAY', aggfunc='count', margins=True, margins_name='Total')
.droplevel(0,axis=1)
.reindex(weekdays, axis=1)
)
print(out2)
DAY Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total
TIME
12 NaN NaN 1.0 NaN NaN 1.0 NaN 2
15 1.0 NaN NaN NaN 1.0 1.0 1.0 4
16 NaN NaN 1.0 1.0 1.0 NaN NaN 3
18 NaN NaN NaN NaN NaN 1.0 NaN 1
Total 1.0 NaN 2.0 1.0 2.0 3.0 1.0 10