I am working on a survey and the data looks like this:
ID Q1 Q2 Q3 Gender Age Dept
001 Y N Y F 22 IT
002 N Y Y M 35 HR
003 Y N N F 20 IT
004 Y N Y M 54 OPRE
005 Y N Y M 42 OPRE
The codes are:
out = df.pivot_table(index='Q1', columns=['Gender'], values=['ID'], aggfunc='count', fill_value=0)
out = (out.join(out[['ID']].div(out['ID'].sum(axis=1).values, axis=0)
.mul(100)
.rename(columns={'ID':'%Respondents'})))
out
And the table I created is like this:
Q1 #Res %Rep
M F M F
Y 2 2 50 50
N 1 0 100 0
But I'd like all the Questions results be at one execution, like
Q1 #Res %Rep
M F M F
Y 2 2 50 50
N 1 0 100 0
Q2 #Res %Rep
M F M F
Y 1 0 100 0
N 2 2 50 50
Q3 #Res %Rep
M F M F
Y 2 2 50 50
N 0 1 0 100
I want to write a function of creating the tables, and use a for loop to go over the questions. Can anyone help?
CodePudding user response:
Here is one way to do it
df2=df[['ID','Gender','Q1','Q2','Q3']].melt(
['ID','Gender'], var_name='question', value_name='response'
).pivot_table(
index=['question','response'], columns='Gender', values='ID', aggfunc='count' ).fillna(0)
df2['%Rep_F'] = df2['F'] /(df2['M'] df2['F'] )*100
df2['%Rep_M'] = df2['M'] /(df2['M'] df2['F'] )*100
df2.sort_values(['question','response'], ascending=[1,0]).astype({'F':int,'M':int})
Gender F M %Rep_F %Rep_M
question response
Q1 Y 2 2 50.0 50.0
N 0 1 0.0 100.0
Q2 Y 0 1 0.0 100.0
N 2 2 50.0 50.0
Q3 Y 1 3 25.0 75.0
N 1 0 100.0 0.0
CodePudding user response:
You could use this function:
def aggregate_over_questions( df ):
data = pd.DataFrame()
#get the number of questions
questions = df.loc[:,df.columns.str.contains("^Q")].columns.tolist()
for q in questions:
temp = df.pivot_table(index=f"{q}", columns=['Gender'], values['ID'], aggfunc='count', fill_value=0)
temp = (temp.join(out[['ID']].div(temp['ID'].sum(axis=1).values, axis=0)
.mul(100)
.rename(columns={'ID':'%Respondents'})))
data = pd.concat( [data, temp], axis=0 )
return data