I looked into many examples but non was comprehensive enough for my case. I have the following data sample:
df = pd.DataFrame({'Teacher ID': [123456,789456,101112,131415],
'Q1':[3,2,4,3],
'Q2':[3,3,3,3],
'Q3':[3,2,3,3],})
Teacher ID | Q1 | Q2 | Q3 |
---|---|---|---|
123456 | 3 | 3 | 3 |
789456 | 2 | 3 | 3 |
123456 | 3 | 3 | 3 |
131415 | 4 | 3 | 3 |
I would like For each teacher ID to know the % of times he graded 1 or 2 or 3 or 4 or 5 for each of the columns above (Questions) and add each of these % to a new column in the dataframe. Note that Teacher ID can appear more than once in the column Teacher ID.
The output should look like:
Teacher ID | %Q1 Graded 1 | %Q1 Graded 2 | %Q1 Graded 3 | %Q2 Graded 1 |
---|---|---|---|---|
123456 | 0% | 0% | 50% | 0% |
789456 | 0% | 25% | 0% | 0% |
131415 | 0% | 0% | 0% | 0% |
CodePudding user response:
Use DataFrame.melt
for unpivot, then SeriesGroupBy.value_counts
for count values per Teacher ID
and question in column Q
, reshape by Series.unstack
, add all missing Grades (for each question 1 to 5) and last divide by length of original DataFrame:
#get all question by columns names with omit first (Teacher ID)
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = (df.melt('Teacher ID', var_name='Q')
.groupby(['Teacher ID','Q'])['value']
.value_counts()
.unstack([1,2], fill_value=0)
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.mul(100)
)
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
Alternative solution with crosstab
:
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = df.melt('Teacher ID', var_name='Q')
df1 = (pd.crosstab(df1['Teacher ID'], [df1['Q'], df1['value']])
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.mul(100))
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
print (df1)
%Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q1 Graded 4 \
Teacher ID
123456 0.0 0.0 50.0 0.0
131415 0.0 0.0 0.0 25.0
789456 0.0 25.0 0.0 0.0
%Q1 Graded 5 %Q2 Graded 1 %Q2 Graded 2 %Q2 Graded 3 \
Teacher ID
123456 0.0 0.0 0.0 50.0
131415 0.0 0.0 0.0 25.0
789456 0.0 0.0 0.0 25.0
%Q2 Graded 4 %Q2 Graded 5 %Q3 Graded 1 %Q3 Graded 2 \
Teacher ID
123456 0.0 0.0 0.0 0.0
131415 0.0 0.0 0.0 0.0
789456 0.0 0.0 0.0 25.0
%Q3 Graded 3 %Q3 Graded 4 %Q3 Graded 5
Teacher ID
123456 50.0 0.0 0.0
131415 25.0 0.0 0.0
789456 0.0 0.0 0.0
For percentages:
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = (df.melt('Teacher ID', var_name='Q')
.groupby(['Teacher ID','Q'])['value']
.value_counts()
.unstack([1,2], fill_value=0)
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.applymap("{:.2%}".format)
)
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
print (df1)
%Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q1 Graded 4 %Q1 Graded 5 \
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 0.00% 25.00% 0.00%
789456 0.00% 25.00% 0.00% 0.00% 0.00%
%Q2 Graded 1 %Q2 Graded 2 %Q2 Graded 3 %Q2 Graded 4 %Q2 Graded 5 \
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 25.00% 0.00% 0.00%
789456 0.00% 0.00% 25.00% 0.00% 0.00%
%Q3 Graded 1 %Q3 Graded 2 %Q3 Graded 3 %Q3 Graded 4 %Q3 Graded 5
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 25.00% 0.00% 0.00%
789456 0.00% 25.00% 0.00% 0.00% 0.00