I have these two dataframes relating to the same data. One of them contains the total of the data and looks like this:
Person ID word rt accuracy emotional_w
0 CHOQUE 1353 C True
0 SILLA 434 C False
0 BRAZO 480 C False
0 LLUVIA 1091 C False
1 SOLEDAD 637 C True
1 INFIERNO 437 I True
1 MOMENTO 754 C False
The other one contains the mean rt and the standard deviation rt and then 'desvios_mayores' and 'desvios_menores' which are the numbers I'm interested in:
Person ID rt_stdev rt_mean desvios_mayores desvios_menores
0 0 311.200383049439 655.975609756098 1278.37637585498 33.5748436572201
1 1 280.592497402182 971.416666666667 1532.60166147103 410.231671862303
2 2 325.848282375085 928.630952380953 1580.32751713112 276.934387630783
I need to check if the person's rt in each of their words is bigger than desvios_mayores or smaller than devios_menores and, if so, replace that number with their rt_mean.
I wrote this so far but it raises error "ValueError: Can only compare identically-labeled Series objects":
if df_outliers_total['Person ID'] == df['Person ID']:
if df['rt'] > df_outliers_total['desvios_mayores']:
df_outliers_total['rt_mean']
elif df['rt'] < df_outliers_total['desvios_menores']:
df_outliers_total['rt_mean']
What's a better way of achieving this? Thank you.
CodePudding user response:
For compare values is necessary left join DataFrame.merge
and then set new values in Series.mask
by chained both masks by |
for bitwise OR
:
df1 = df.merge(df_outliers_total, on='Person ID', how='left')
m = (df1['rt'] > df1['desvios_mayores']) | (df1['rt'] < df1['desvios_menores'])
df1['rt'] = df1['rt'].mask(m, df1['rt_mean'])
#for original columns names
df1 = df1.reindex(df.columns, axis=1)
print (df1)
Person ID word rt accuracy emotional_w
0 0 CHOQUE 655.97561 C True
1 0 SILLA 434.00000 C False
2 0 BRAZO 480.00000 C False
3 0 LLUVIA 1091.00000 C False
4 1 SOLEDAD 637.00000 C True
5 1 INFIERNO 437.00000 I True
6 1 MOMENTO 754.00000 C False