Home > front end >  How to create a frequency table of each subject from a given timetable using pandas?
How to create a frequency table of each subject from a given timetable using pandas?

Time:03-06

This is a time table, columns=hour, rows=weekday, data=subject [weekday x hour]

                               1                      2                 3             4                 5                      6                      7
Name                                                                                                                                                   
Monday                   Project                Project           Project  Data Science  Embedded Systems            Data Mining  Industrial Psychology
Tuesday                  Project                Project           Project       Project      Data Science  Industrial Psychology       Embedded Systems
Wednesday           Data Science                Project           Project       Project           Project                Project                Project
Thursday             Data Mining  Industrial Psychology  Embedded Systems   Data Mining           Project                Project                Project
Friday     Industrial Psychology       Embedded Systems      Data Science   Data Mining           Project                Project                Project

How do you generate a pandas.Dataframe where, rows=weekday, columns=subject, data = subject frequency in the corresponding weekday?

Required table: [weekday x subject]

              Data Mining, Data Science, Embedded Systems, Industrial Psychology, Project                                                             
Name                                                                                                                                                   
Monday           1          1            1                 1                      3
Tuesday          ...         
Wednesday                     
Thursday                                     
Friday                               
        self.file = 'timetable.csv'
        self.sdf = pd.read_csv(self.file, header=0, index_col="Name")
        print(self.sdf.to_string())
        self.subject_frequency = self.sdf.apply(pd.value_counts)
        print(self.subject_frequency.to_string())
        self.subject_frequency["sum"] = self.subject_frequency.sum(axis=1)

CodePudding user response:

Use melt to flatten your dataframe then pivot_table to reshape your dataframe:

out = (
  df.melt(var_name='Freq', value_name='Data', ignore_index=False).assign(variable=1)
    .pivot_table('Freq', 'Name', 'Data', fill_value=0, aggfunc='count')
    .loc[df.index]  # sort by original index: Monday > Thuesday > ...
)

Output:

>>> out
Data       Data Mining  Data Science  Embedded Systems  Industrial Psychology  Project
Name                                                                                  
Monday               1             1                 1                      1        3
Tuesday              0             1                 1                      1        4
Wednesday            0             1                 0                      0        6
Thursday             2             0                 1                      1        3
Friday               1             1                 1                      1        3
  • Related