Home > Enterprise >  Analyse and filtering survey responses in a dataframe
Analyse and filtering survey responses in a dataframe

Time:11-02

I'm trying to analyse survey data, that I get from a csv file. The specificity of this survey is to have "conditional fields", for example when you answered "yes" to the first question, your didn't see the 2nd question on go strait to the 3rd. So my dataframe have blanks, and it's not to fill. I'm trying to filter answers with differents parameters.

Here an example :

df = pd.DataFrame({
    'q1 : do you pratice magic ?' : ['yes', 'no', 'no', 'not sure', 'yes', 'no'],
    'q2 : What is your wizard level ?' : [8,np.nan,np.nan,np.nan,2,np.nan],
    'q3 : Have you special abilities ?' : [np.nan,np.nan,np.nan,'yes',np.nan,np.nan], 
    'q4 : Are you a muggle ?' : [np.nan,'yes','don\'t know',np.nan,np.nan,'yes'], 
    'q5 : How old are you ?' : [20,14,48,27,36,45]
    })

To the q1 you have choice with 3 answer that condition the next question, with this pattern :

if q1 : yes -> q2, q5

if q1 : not sure -> q3,q5

if q1 : no -> q4, q5

In every case, everyone come together to the q5.

How to obtain specifically those answered to q1 "no" and have less than 40 ?

CodePudding user response:

If I understand you correctly you can use boolean indexing:

mask = df["q1 : do you pratice magic ?"].eq("no") & df["q5 : How old are you ?"].lt(40)

print(df[mask])

Prints:

  q1 : do you pratice magic ?  q2 : What is your wizard level ? q3 : Have you special abilities ? q4 : Are you a muggle ?  q5 : How old are you ?
1                          no                               NaN                               NaN                     yes                      14
  • Related