I have a dataframe that looks like this (this is just a small sample):
student school team answers question
a scl first True x
a scl first False y
a scl first True y
b scl first False x
c scl sec False y
c scl sec True z
d scl sec True x
d scl sec True z
e scl third True z
e scl third False z
I want to do a ranking that looks like this:
df_overall=
question first sec third
0 x 0.5 1.0 NaN
1 y 0.5 0.0 NaN
2 z NaN 1.0 0.5
So I wrote:
df_overall = df.groupby(['team', 'question'])['answers'].apply(lambda x: x.sum()/len(x)).reset_index()
df_overall = df_overall.sort_values(by=['question']).rename(columns={'answers': 'TeamRanking'})
df_overall = df_overall.pivot_table(index='question', columns='team', values='TeamRanking').reset_index().rename_axis(None, axis=1)
But it gives me a KeyError: 'team'
in the last line. If I run just the first two lines, it works. I tried putting ['team']
in brackets, and I checked the columns print (df.columns.tolist())
and they are all there, no blank spaces, no weird writing. The dtypes
are all objects except answers
which is bool. I really don't understand why it's not finding it
CodePudding user response:
Pivot and drop multi-index column, code below
newdf =pd.pivot_table(df,index='question', columns=['team'], aggfunc=np.mean).droplevel(0, axis=1).reset_index()
team question first sec third
0 x 0.5 1.0 NaN
1 y 0.5 0.0 NaN
2 z NaN 1.0 0.5
CodePudding user response:
Use pivot_table
:
>>> df.pivot_table(index='question', columns='team', values='answers')
team first sec third
question
x 0.5 1.0 NaN
y 0.5 0.0 NaN
z NaN 1.0 0.5
I also considered the case where I only take the first answer of a student into consideration and I ignore the fact that they answered the same question several times
>>> df.drop_duplicates(['student', 'team', 'question']) \
.pivot_table(index='question', columns='team', values='answers')
team first sec third
question
x 0.5 1.0 NaN
y 0.0 0.0 NaN
z NaN 1.0 1.0