Time table, columns=hour, rows=weekday, data=subject
[weekday x hour]
1 2 3 4 5 6 7
Name
Monday Project Project Project Data Science Embedded Systems Data Mining Industrial Psychology
Tuesday Project Project Project Project Data Science Industrial Psychology Embedded Systems
Wednesday Data Science Project Project Project Project Project Project
Thursday Data Mining Industrial Psychology Embedded Systems Data Mining Project Project Project
Friday Industrial Psychology Embedded Systems Data Science Data Mining Project Project Project
Frequency table rows=weekday, columns=subject, data = subject frequency in the corresponding weekday
[weekday x subject]
Data Data Mining Data Science Embedded Systems Industrial Psychology Project
Name
Friday 1 1 1 1 3
Monday 1 1 1 1 3
Thursday 2 0 1 1 3
Tuesday 0 1 1 1 4
Wednesday 0 1 0 0 6
Code
self.start = datetime(2022, 1, 1)
self.end = datetime(2022, 3, 31)
self.file = 'timetable.csv'
self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
self.subject_frequency = self.sdf.apply(pd.value_counts).fillna(0)
print(self.subject_frequency.to_string())
self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)
self.p = self.sdf.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)\
.pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
print(self.p.to_string())
Required Table
classes ...
Data Mining 32
Data Science 32
Embedded Systems 32
Industrial Psychology 32
Project 146
Will be adding more columns later, like current attendance percentage, percentage drop for each class missed, percent losses for taking leaves on Monday, Tuesday, ... etc so as to subtract them from attendance percentage.
The end goal is to analyse which day is safe to take a leave, and to monitor my percentage. If my direction could be better, please advise me.
CodePudding user response:
One possible approach is to use bdate_range
like you did and use weekday
to select the weekdays (0-4) and map
these numbers to their corresponding weekday names; then reindex
the Frequency table with it. Then you get a DataFrame where each row corresponds to a weekday between 2022-1-1 and 2022-3-31. Then sum
finds the total for each class:
out = (freqtable.reindex(pd.bdate_range('2022-1-1','2022-3-31').weekday
.map(dict(enumerate(['Monday','Tuesday','Wednesday','Thursday','Friday']))))
.sum()
.rename_axis(['classes']).reset_index(name='count'))
Output:
classes count
0 Data Mining 51
1 Data Science 51
2 Embedded Systems 51
3 Industrial Psychology 51
4 Project 244
CodePudding user response:
select_rows = [date.strftime("%A") for date in pd.bdate_range(self.start, self.end)]
r = self.p.loc[select_rows, :]
print(r.to_string())
print(r.sum())
Please feel free to add a simpler code, design advice is also appreciated!