I have a dataframe
data_in = {'A':['A1', '', '', 'A4',''],
'B':['', 'B2', 'B3', '',''],
'C':['C1','C2','','','C5']}
df_in = pd.DataFrame(data)
print(df_in)
A B C
0 A1 C1
1 B2 C2
2 B3
3 A4
4 C5
I'm trying to replace A or B column if C column is not empty and A or B are not empty. After replacing, I need to clear value in C column.
I expect this output
A B C
0 C1
1 C2
2 B3
3 A4
4 C5
I tried several things, the closest is
df_in['A'] = np.where(
(df_in['A'] !='') & (df_in['C'] != '') , df_in['A'], df_in['C']
)
df_in['B'] = np.where(
(df_in['B'] !='') & (df_in['C'] != '') , df_in['B'], df_in['C']
)
But this clear also the other value and I l'm loosing A4 and B3 and I don't clear C1 and C2
What I got
A B C
0 C1 C1
1 C2 C2
2
3
4 C5
Thank you
CodePudding user response:
I'm not sure if there is an issue setting a columns value that is also in the where condition off hand but you could always create a temp column and rename/drop other outputs based on that.
An alternative is to use the apply function.
def update_data(row):
a = row['A']
b = row['B']
c = row['C']
if not c.isna():
if a.isna():
row['A'] = c
if b.isna():
row['B'] = c
return row
df_new = df.apply(update_data, axis=1)
Apply will definitely get you the correct result, however, I'm not certain as to what your desired outcome is so you may need to adjust the logic. The above will set columns A and/or B = C if A is a na type object ("" is a na type) and C is not a na type object. Otherwise it will not update anything.
I'm not sure what you want by "clear column C". You can just drop the column if that's what you want. If you want to change the value you can do so in the update_data function or do a string replace.
CodePudding user response:
You are very close, but you have the arguments switched in np.where
, the syntax is np.where(cond, if_cond_True, if_cond_False)
. The columns A and B should have the value of column if the condition is satisfied (if_cond_True
), otherwise they keep their original values (if_cond_False
).
import pandas as pd
import numpy as np
data_in = {'A':['A1', '', '', 'A4',''],
'B':['', 'B2', 'B3', '',''],
'C':['C1','C2','','','C5']}
df_in = pd.DataFrame(data_in)
maskA = df_in['A'] != '' # A not empty
maskB = df_in['B'] != '' # B not empty
maskC = df_in['C'] != '' # C not empty
# If A and C are not empty, A = C, else A keep its value
df_in['A'] = np.where(maskA & maskC, df_in['C'], df_in['A'])
# If B and C are not empty, B = C, else B keep its value
df_in['B'] = np.where(maskB & maskC, df_in['C'], df_in['B'])
# If (A and C are not empty) or (B and C are not empty),
# C should be empty, else C keep its value
df_in['C'] = np.where((maskA & maskC) | (maskB & maskC), "", df_in['C'])
Output
>>> df_in
A B C
0 C1
1 C2
2 B3
3 A4
4 C5