Python: How to subtract value from subselection of cells in a pandas DataFrame IN PLACE?-CodePudding

I try to subtract a value (50) from a subselection of cells in a pandas DataFrame. I want to subtract the value from ‘rt’ where subj == 1 and cond == std. I would like to perform this calculation in place so that the other values stay untouched.

Lets say I have the following DataFrame:

data = {'subj': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,],
        'rt': [100, 102, 101, 100, 101, 101, 105, 105, 106, 104, 104, 106],
        'cond':['nov', 'std', 'std', 'emo', 'std', 'emo', 'nov', 'std', 'std',
                'emo', 'std', 'emo']} 
df = pd.DataFrame(data)

       subj rt cond
0      1  100  nov
1      1  102  std
2      1  101  std
3      1  100  emo
4      1  101  std
5      1  101  emo
6      2  105  nov
7      2  105  std
8      2  106  std
9      2  104  emo
10     2  104  std
11     2  106  emo

Now I want to subtract the the value 50 from the 'rt' that meet the criteria subj == 1 and cond == std. I am using the following code to subtract the value.

df['rt'] = df[(df['subj'] == 1) & (df['cond'] == 'std')]['rt'].subtract(50)

This is, what I am expecting:

       subj rt cond
0      1  100  nov
1      1   52  std
2      1   51  std
3      1  100  emo
4      1   51  std
5      1  101  emo
6      2  105  nov
7      2  105  std
8      2  106  std
9      2  104  emo
10     2  104  std
11     2  106  emo

Instead this is what i get:

       subj rt cond
0      1  NaN  nov
1      1  2.0  std
2      1  1.0  std
3      1  NaN  emo
4      1  1.0  std
5      1  NaN  emo
6      2  NaN  nov
7      2  NaN  std
8      2  NaN  std
9      2  NaN  emo
10     2  NaN  std
11     2  NaN  emo

How can I keep the remaining values of the rt column instead of having NaN? I want to subtract the other rt values by other values in the same manner without creating separate DataFrames for each condition.

CodePudding user response：

This can be accomplished using np.where()

import pandas as pd
import numpy as np
data = {'subj': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,],
        'rt': [100, 102, 101, 100, 101, 101, 105, 105, 106, 104, 104, 106],
        'cond':['nov', 'std', 'std', 'emo', 'std', 'emo', 'nov', 'std', 'std',
                'emo', 'std', 'emo']} 
df = pd.DataFrame(data)
df['rt'] = np.where((df['subj'] == 1) & (df['cond'] == 'std'), df['rt'].sub(50), df['rt'])
df

CodePudding user response：

Use slicing with loc to do it in place efficiently (only the 3 matching values will be computed):

df.loc[df['subj'].eq(1)&df['cond'].eq('std'), 'rt'] -= 50

output:

    subj   rt cond
0      1  100  nov
1      1   52  std
2      1   51  std
3      1  100  emo
4      1   51  std
5      1  101  emo
6      2  105  nov
7      2  105  std
8      2  106  std
9      2  104  emo
10     2  104  std
11     2  106  emo

CodePudding user response：

I tried using df.loc as below and it worked:

df.loc[(df['subj']==1) & (df['cond']=='std'),'rt'] = df['rt'].subtract(50)