Home > front end >  How to use Python for loop to execute the function inside?
How to use Python for loop to execute the function inside?

Time:06-25

I am working on a survey and the data looks like this:

ID    Q1    Q2    Q3    Gender    Age    Dept
001   Y      N    Y      F         22     IT
002   N      Y    Y      M         35     HR
003   Y      N    N      F         20     IT
004   Y      N    Y      M         54     OPRE
005   Y      N    Y      M         42     OPRE

The codes are:

out = df.pivot_table(index='Q1', columns=['Gender'], values=['ID'], aggfunc='count', fill_value=0)
out = (out.join(out[['ID']].div(out['ID'].sum(axis=1).values, axis=0)
            .mul(100)
            .rename(columns={'ID':'%Respondents'})))
out

And the table I created is like this:

Q1      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      1    0      100   0

But I'd like all the Questions results be at one execution, like

Q1      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      1    0      100   0

Q2      #Res        %Rep
       M    F      M    F
Y      1    0      100  0
N      2    2      50   50

Q3      #Res        %Rep
       M    F      M    F
Y      2    2      50   50
N      0    1      0    100

I want to write a function of creating the tables, and use a for loop to go over the questions. Can anyone help?

CodePudding user response:

Here is one way to do it


df2=df[['ID','Gender','Q1','Q2','Q3']].melt(
    ['ID','Gender'], var_name='question', value_name='response'
).pivot_table(
    index=['question','response'], columns='Gender', values='ID', aggfunc='count' ).fillna(0)

df2['%Rep_F'] = df2['F'] /(df2['M'] df2['F'] )*100
df2['%Rep_M'] = df2['M'] /(df2['M'] df2['F'] )*100
df2.sort_values(['question','response'], ascending=[1,0]).astype({'F':int,'M':int})



           Gender    F  M   %Rep_F  %Rep_M
question  response              
   Q1       Y        2  2    50.0    50.0
            N        0  1     0.0   100.0
   Q2       Y        0  1     0.0   100.0
            N        2  2    50.0    50.0
   Q3       Y        1  3    25.0    75.0
            N        1  0   100.0     0.0


CodePudding user response:

You could use this function:

def aggregate_over_questions( df ):
    data = pd.DataFrame()
    #get the number of questions
    questions = df.loc[:,df.columns.str.contains("^Q")].columns.tolist()

    for q in questions:
        temp = df.pivot_table(index=f"{q}", columns=['Gender'], values['ID'], aggfunc='count', fill_value=0)
        temp = (temp.join(out[['ID']].div(temp['ID'].sum(axis=1).values, axis=0)
                    .mul(100)
                    .rename(columns={'ID':'%Respondents'})))
       data = pd.concat( [data, temp], axis=0 )
    return data

Image

  • Related