Home > Software engineering >  Pandas: replacing values if they are too big or too small given a condition
Pandas: replacing values if they are too big or too small given a condition

Time:10-08

I have these two dataframes relating to the same data. One of them contains the total of the data and looks like this:

Person ID   word    rt  accuracy    emotional_w
0          CHOQUE   1353    C   True
0          SILLA    434     C   False
0          BRAZO    480     C   False
0          LLUVIA   1091    C   False
1          SOLEDAD  637     C   True
1          INFIERNO 437     I   True
1          MOMENTO  754     C   False

The other one contains the mean rt and the standard deviation rt and then 'desvios_mayores' and 'desvios_menores' which are the numbers I'm interested in:

   Person ID   rt_stdev         rt_mean            desvios_mayores      desvios_menores
0   0       311.200383049439    655.975609756098    1278.37637585498    33.5748436572201
1   1       280.592497402182    971.416666666667    1532.60166147103    410.231671862303
2   2       325.848282375085    928.630952380953    1580.32751713112    276.934387630783

I need to check if the person's rt in each of their words is bigger than desvios_mayores or smaller than devios_menores and, if so, replace that number with their rt_mean.

I wrote this so far but it raises error "ValueError: Can only compare identically-labeled Series objects":

if df_outliers_total['Person ID'] == df['Person ID']:
    if df['rt'] > df_outliers_total['desvios_mayores']:
        df_outliers_total['rt_mean']
    elif df['rt'] < df_outliers_total['desvios_menores']:
        df_outliers_total['rt_mean']

What's a better way of achieving this? Thank you.

CodePudding user response:

For compare values is necessary left join DataFrame.merge and then set new values in Series.mask by chained both masks by | for bitwise OR:

df1 = df.merge(df_outliers_total, on='Person ID', how='left')

m = (df1['rt'] > df1['desvios_mayores']) | (df1['rt'] < df1['desvios_menores'])
df1['rt'] = df1['rt'].mask(m, df1['rt_mean'])

#for original columns names
df1 = df1.reindex(df.columns, axis=1)
print (df1)
   Person ID      word          rt accuracy  emotional_w
0          0    CHOQUE   655.97561        C         True
1          0     SILLA   434.00000        C        False
2          0     BRAZO   480.00000        C        False
3          0    LLUVIA  1091.00000        C        False
4          1   SOLEDAD   637.00000        C         True
5          1  INFIERNO   437.00000        I         True
6          1   MOMENTO   754.00000        C        False
  • Related