Python Pandas replace value based on multiple column conditions-CodePudding

I have a dataframe

data_in = {'A':['A1', '', '', 'A4',''],
        'B':['', 'B2', 'B3', '',''],
        'C':['C1','C2','','','C5']}
df_in = pd.DataFrame(data)

print(df_in)

    A   B   C
0  A1      C1
1      B2  C2
2      B3    
3  A4        
4          C5

I'm trying to replace A or B column if C column is not empty and A or B are not empty. After replacing, I need to clear value in C column.

I expect this output

    A   B   C
0   C1      
1       C2  
2       B3  
3   A4      
4           C5

I tried several things, the closest is

df_in['A'] = np.where(
   (df_in['A'] !='') & (df_in['C'] != '') , df_in['A'], df_in['C']
   )

df_in['B'] = np.where(
   (df_in['B'] !='') & (df_in['C'] != '') , df_in['B'], df_in['C']
   )

But this clear also the other value and I l'm loosing A4 and B3 and I don't clear C1 and C2

What I got

    A   B   C
0   C1      C1
1       C2  C2
2           
3           
4           C5

Thank you

CodePudding user response：

I'm not sure if there is an issue setting a columns value that is also in the where condition off hand but you could always create a temp column and rename/drop other outputs based on that.

An alternative is to use the apply function.

def update_data(row):
    a = row['A']
    b = row['B']
    c = row['C']

    if not c.isna():
        if a.isna():
            row['A'] = c

        if b.isna():
            row['B'] = c

    return row

df_new = df.apply(update_data, axis=1)

Apply will definitely get you the correct result, however, I'm not certain as to what your desired outcome is so you may need to adjust the logic. The above will set columns A and/or B = C if A is a na type object ("" is a na type) and C is not a na type object. Otherwise it will not update anything.

I'm not sure what you want by "clear column C". You can just drop the column if that's what you want. If you want to change the value you can do so in the update_data function or do a string replace.

CodePudding user response：

You are very close, but you have the arguments switched in np.where, the syntax is np.where(cond, if_cond_True, if_cond_False). The columns A and B should have the value of column if the condition is satisfied (if_cond_True), otherwise they keep their original values (if_cond_False).

import pandas as pd
import numpy as np 

data_in = {'A':['A1', '', '', 'A4',''],
        'B':['', 'B2', 'B3', '',''],
        'C':['C1','C2','','','C5']}

df_in = pd.DataFrame(data_in)

maskA = df_in['A'] != ''   # A not empty
maskB = df_in['B'] != ''   # B not empty
maskC = df_in['C'] != ''   # C not empty

# If A and C are not empty, A = C, else A keep its value 
df_in['A'] = np.where(maskA & maskC, df_in['C'], df_in['A'])

# If B and C are not empty, B = C, else B keep its value
df_in['B'] = np.where(maskB & maskC, df_in['C'], df_in['B'])

# If (A and C are not empty) or (B and C are not empty),
# C should be empty, else C keep its value
df_in['C'] = np.where((maskA & maskC) | (maskB & maskC), "", df_in['C'])

Output

>>> df_in 

    A   B   C
0  C1        
1      C2    
2      B3    
3  A4        
4          C5