Home > front end >  Calculate the % of each occurrence of a column value and group by an ID
Calculate the % of each occurrence of a column value and group by an ID

Time:05-25

I looked into many examples but non was comprehensive enough for my case. I have the following data sample:

df = pd.DataFrame({'Teacher ID': [123456,789456,101112,131415],
                  'Q1':[3,2,4,3],
                  'Q2':[3,3,3,3],
                  'Q3':[3,2,3,3],})
Teacher ID Q1 Q2 Q3
123456 3 3 3
789456 2 3 3
123456 3 3 3
131415 4 3 3

I would like For each teacher ID to know the % of times he graded 1 or 2 or 3 or 4 or 5 for each of the columns above (Questions) and add each of these % to a new column in the dataframe. Note that Teacher ID can appear more than once in the column Teacher ID.

The output should look like:

Teacher ID %Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q2 Graded 1
123456 0% 0% 50% 0%
789456 0% 25% 0% 0%
131415 0% 0% 0% 0%

CodePudding user response:

Use DataFrame.melt for unpivot, then SeriesGroupBy.value_counts for count values per Teacher ID and question in column Q, reshape by Series.unstack, add all missing Grades (for each question 1 to 5) and last divide by length of original DataFrame:

#get all question by columns names with omit first (Teacher ID)
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])

df1 = (df.melt('Teacher ID', var_name='Q')
         .groupby(['Teacher ID','Q'])['value']
         .value_counts()
         .unstack([1,2], fill_value=0)
         .reindex(mux, axis=1, fill_value=0)
         .div(len(df))
         .mul(100)
         )
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')

Alternative solution with crosstab:

mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])

df1 = df.melt('Teacher ID', var_name='Q')

df1 = (pd.crosstab(df1['Teacher ID'], [df1['Q'], df1['value']])
         .reindex(mux, axis=1, fill_value=0)
         .div(len(df))
         .mul(100))
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')

print (df1)
            %Q1 Graded 1  %Q1 Graded 2  %Q1 Graded 3  %Q1 Graded 4  \
Teacher ID                                                           
123456               0.0           0.0          50.0           0.0   
131415               0.0           0.0           0.0          25.0   
789456               0.0          25.0           0.0           0.0   

            %Q1 Graded 5  %Q2 Graded 1  %Q2 Graded 2  %Q2 Graded 3  \
Teacher ID                                                           
123456               0.0           0.0           0.0          50.0   
131415               0.0           0.0           0.0          25.0   
789456               0.0           0.0           0.0          25.0   

            %Q2 Graded 4  %Q2 Graded 5  %Q3 Graded 1  %Q3 Graded 2  \
Teacher ID                                                           
123456               0.0           0.0           0.0           0.0   
131415               0.0           0.0           0.0           0.0   
789456               0.0           0.0           0.0          25.0   

            %Q3 Graded 3  %Q3 Graded 4  %Q3 Graded 5  
Teacher ID                                            
123456              50.0           0.0           0.0  
131415              25.0           0.0           0.0  
789456               0.0           0.0           0.0  

For percentages:

mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])

df1 = (df.melt('Teacher ID', var_name='Q')
         .groupby(['Teacher ID','Q'])['value']
         .value_counts()
         .unstack([1,2], fill_value=0)
         .reindex(mux, axis=1, fill_value=0)
         .div(len(df))
         .applymap("{:.2%}".format)
         )
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
print (df1)
           %Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q1 Graded 4 %Q1 Graded 5  \
Teacher ID                                                                    
123456            0.00%        0.00%       50.00%        0.00%        0.00%   
131415            0.00%        0.00%        0.00%       25.00%        0.00%   
789456            0.00%       25.00%        0.00%        0.00%        0.00%   

           %Q2 Graded 1 %Q2 Graded 2 %Q2 Graded 3 %Q2 Graded 4 %Q2 Graded 5  \
Teacher ID                                                                    
123456            0.00%        0.00%       50.00%        0.00%        0.00%   
131415            0.00%        0.00%       25.00%        0.00%        0.00%   
789456            0.00%        0.00%       25.00%        0.00%        0.00%   

           %Q3 Graded 1 %Q3 Graded 2 %Q3 Graded 3 %Q3 Graded 4 %Q3 Graded 5  
Teacher ID                                                                   
123456            0.00%        0.00%       50.00%        0.00%        0.00%  
131415            0.00%        0.00%       25.00%        0.00%        0.00%  
789456            0.00%       25.00%        0.00%        0.00%        0.00
    
  • Related