Home > Net >  replace value to NaN based on other column value python pandas
replace value to NaN based on other column value python pandas

Time:09-24

I got the following test dataframe.

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 1234 True False
2 12345 123 123 False True
33 4444 22 2343321 3223455 False False
66 55555 333 55555 333 True True

If Match1 is True I want phone1_y value empty, so NaN. If Match2 is True I want phone2_y value empty.

Output would looks like this:

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 True False
2 12345 123 False True
33 4444 22 2343321 3223455 False False
66 55555 333 True True

it would also be cool to get the following output:

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 12 True False
2 12345 123 12345 False True
33 4444 22 2343321 3223455 False False
66 55555 333 True True

I hope someone want to help. I am working with python pandas.

EDIT:

It's still not working for me, I guess because my code is wrong:

df = pd.read_csv('TM.csv', sep=',') #x
df1 = pd.read_csv('TM_verrijk.csv', sep=',') #y 

df['phone1'] = df['phone1'].str.replace('-', '')
df['phone1'] = df['phone1'].str.replace(' ', '')
df1['phone1'] = df1['phone1'].str.replace('-', '')
df1['phone1'] = df1['phone1'].str.replace(' ', '')

df['phone2'] = df['phone2'].str.replace('-', '')
df['phone2'] = df['phone2'].str.replace(' ', '')
df1['phone2'] = df1['phone2'].str.replace('-', '')
df1['phone2'] = df1['phone2'].str.replace(' ', '')

dff = df.merge(df1, on='contact')

dff['phone1_x'] = pd.to_numeric(dff['phone1_x'], errors='coerce')
dff['phone1_y'] = pd.to_numeric(dff['phone1_y'], errors='coerce')

dff['phone2_x'] = pd.to_numeric(dff['phone2_x'], errors='coerce')
dff['phone2_y'] = pd.to_numeric(dff['phone2_y'], errors='coerce')

dff['Match?'] = np.where(dff['phone1_x'] == dff['phone1_y'], 'True', 'False')
dff['Match2?'] = np.where(dff['phone2_x'] == dff['phone2_y'], 'True', 'False')

When i run dff now I got the same dataframe as in my question:

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 1234 True False
2 12345 123 123 False True
33 4444 22 2343321 3223455 False False
66 55555 333 55555 333 True True
dff[['phone1_y','phone2_y']] = np.where(dff[['Match?','Match2?']], 
                                       np.nan, 
                                       dff[['phone1_x','phone2_x']])

This code gives everything NaN value. Also the other answers gives NaN value:

contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
1 1234 12 True False
2 12345 123 False True
33 4444 22 False False
66 55555 333 True True

Any idea why it won't work with my code?

EDIT:

Columns Match? and Match2? needs to be boolean:

dff['Match?'] = dff['Match?'].map({'False':False, 'True':True})
dff['Match2?'] = dff['Match2?'].map({'False':False, 'True':True})

Thanks everyone for the answers!!

CodePudding user response:

Let me teach you the magic of the apply function in pandas

def get_new_value(first_val,sec_val):
    if sec_val:
        return None
    return first_val

df['phone1_y'] = df.apply(lambda x: get_new_value(x['phone1_y'],x['Match1']),axis=1)
df['phone2_y'] = df.apply(lambda x: get_new_value(x['phone2_y'],x['Match2']),axis=1)

make sure you use the axis and I am sure you will manage the last part yourself see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html for more options

CodePudding user response:

Use .loc

#Set phone1_y as NaN if Match1 == True
df.loc[df["Match1"] == True, "phone1_y"] = np.nan

#Set phone2_y as NaN if Match2 == True
df.loc[df["Match2"] == True, "phone2_y"] = np.nan

#Set phone1_y as phone1_x if phone1_y is NaN and phone1_x is not NaN
df.loc[(df["phone1_y"].isna()) & (df["phone1_x"].notna()), "phone1_y"] = df["phone1_x"]

#Set phone2_y as phone2_x if phone2_y is NaN and phone2_x is not NaN
df.loc[(df["phone2_y"].isna()) & (df["phone2_x"].notna()), "phone2_y"] = df["phone2_x"]

CodePudding user response:

Use numpy.where - first replace True to np.nan, if False no replace:

df[['phone1_y','phone2_y']] = np.where(df[['Match1','Match2']], 
                                       np.nan, 
                                       df[['phone1_y','phone2_y']])
print (df)
   contact  phone1_x  phone2_x   phone1_y   phone2_y  Match1  Match2
0        1      1234        12        NaN        NaN    True   False
1        2     12345       123        NaN        NaN   False    True
2       33      4444        22  2343321.0  3223455.0   False   False
3       66     55555       333        NaN        NaN    True    True

For second change first solution to if False then replace by 'phone1_x','phone2_x':

df[['phone1_y','phone2_y']] = np.where(df[['Match1','Match2']], 
                                       np.nan, 
                                       df[['phone1_x','phone2_x']])
print (df)
   contact  phone1_x  phone2_x  phone1_y  phone2_y  Match1  Match2
0        1      1234        12       NaN      12.0    True   False
1        2     12345       123   12345.0       NaN   False    True
2       33      4444        22    4444.0      22.0   False   False
3       66     55555       333       NaN       NaN    True    True
  • Related