Home > Mobile >  Pandas: how to add a column based on two other columns meeting a certain condition [duplicate]
Pandas: how to add a column based on two other columns meeting a certain condition [duplicate]

Time:10-08

I have this data in which I have a column that indicates a color and another one that indicates a letter. If the color and the letter 'belong' together, then the data is correct so a new column should state a C. Otherwise, it should state an I.

I did it like this but the thing is, this only puts all the correct ones at the top and the incorrect ones at the bottom:

#correct
c1 = df['color'].eq('green') & df['value'].eq('V')
c2 = df['color'].eq('blue') & df['value'].eq('A')
c3 = df['color'].eq('red') & df['value'].eq('R')
m = c1 | c2 | c3

correct_df = df.loc[m, ['Person ID','word', 'rt', 'color']]

correct_df['accuracy'] = 'C'

incorrect_df = df.loc[~m, ['word', 'rt', 'color']]
incorrect_df['accuracy'] = 'I'

df_cor_inc = correct_df.append([incorrect_df])

What I need is to have instead the other column just be added to the side and say whether the response was correct or not but in the order the data is already in.

This is a sample of the data:

Person ID  value  word    color  correct  rt
0           R     FLOWER  red     r       1223
0           B     CAR     blue    b       33    
1           G     KNIFE   blue    b       333
1           R     CAT     red     r       2332  
2           B     CHILD   green   g       232

This is how I want it to look:

Person ID  value  word    color  correct  rt    accuracy
0           R     FLOWER  red     r       1223  C
0           B     CAR     blue    b       33    C
1           G     KNIFE   blue    b       333   I
1           R     CAT     red     r       2332  C
2           B     CHILD   green   g       232   I

CodePudding user response:

Reusing your boolean mask m, we can use np.where() as follows:

df['accuracy'] = np.where(m, 'C', 'I')

np.where() acts like an if-then-else statement. If the condition in first parameter is True, it will set value according to the second parameter ('C' here); Else, it will set value according to the third parameter ('I' here).

Result:

print(df)

   Person ID value    word  color correct    rt accuracy
0          0     R  FLOWER    red       r  1223        C
1          0     B     CAR   blue       b    33        I
2          1     G   KNIFE   blue       b   333        I
3          1     R     CAT    red       r  2332        C
4          2     B   CHILD  green       g   232        I
  • Related