Home > Back-end >  Replacing the value in a column based on a single vectorized condition
Replacing the value in a column based on a single vectorized condition

Time:02-14

I tried the following code with multiple conditions and worked perfectly, somehow whatever I am trying with a single condition it won't work. It will simply return the values "1" & "SD" for every row no matter if the value of Col_A is True. Please let me know any suggestions, the weird part is that I am not getting any error whatsoever.

condition = [
    df1['Col_A'] == 'TRUE'
]

result1 = ['1']
result2 = ['SD']

df1['Col_B'] = np.select(condition, result1, 'NA')
df1['Col_C'] = np.select(condition, result2, 'NA')
df1```

Input

        Col_A    Col_B     Col_C
    0   TRUE       NA         NA     
    1   FALSE      1001       Valid  
    2   TRUE       NA         NA    
    3   TRUE       NA         NA

Output

        Col_A    Col_B     Col_C
    0   TRUE       NA         NA     
    1   FALSE      NA         NA  
    2   TRUE       NA         NA    
    3   TRUE       NA         NA
  
Desired Output

        Col_A    Col_B     Col_C
    0   TRUE       1         SD     
    1   FALSE      1001      Valid  
    2   TRUE       1         SD    
    3   TRUE       1         SD

CodePudding user response:

You could use a simple loc assignment with a condition:

string "TRUE":

df.loc[df['Col_A'].eq('TRUE'), ['Col_B', 'Col_C']] = [1, 'SD']

boolean True:

df.loc[df['Col_A'], ['Col_B', 'Col_C']] = [1, 'SD']

output:

   Col_A   Col_B  Col_C
0   TRUE     1.0     SD
1  FALSE  1001.0  Valid
2   TRUE     1.0     SD
3   TRUE     1.0     SD

CodePudding user response:

Try np.where():

This will replace the value where the condition is True with the x (second parameter) value and uses y (third parameter) elsewise.

Example:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'Col_A':['TRUE', 'FALSE', 'TRUE', 'TRUE'],
    'Col_B':['NA', '1001', 'NA', 'NA'],
    'Col_C':['NA', 'Valid', 'NA', 'NA'],
    
})
>>> df1
   Col_A Col_B  Col_C
0   TRUE    NA     NA
1  FALSE  1001  Valid
2   TRUE    NA     NA
3   TRUE    NA     NA

condition = df1['Col_A'] == 'TRUE'

df1['Col_B'] = np.where(condition, '1', df1['Col_B'])
df1['Col_C'] = np.where(condition, 'SD', df1['Col_C'])

>>> df1
   Col_A Col_B  Col_C
0   TRUE     1     SD
1  FALSE  1001  Valid
2   TRUE     1     SD
3   TRUE     1     SD
  • Related