I got the following test dataframe.
contact | phone1_x | phone2_x | phone1_y | phone2_y | Match1 | Match2 |
---|---|---|---|---|---|---|
1 | 1234 | 12 | 1234 | True | False | |
2 | 12345 | 123 | 123 | False | True | |
33 | 4444 | 22 | 2343321 | 3223455 | False | False |
66 | 55555 | 333 | 55555 | 333 | True | True |
If Match1 is True I want phone1_y value empty, so NaN. If Match2 is True I want phone2_y value empty.
Output would looks like this:
contact | phone1_x | phone2_x | phone1_y | phone2_y | Match1 | Match2 |
---|---|---|---|---|---|---|
1 | 1234 | 12 | True | False | ||
2 | 12345 | 123 | False | True | ||
33 | 4444 | 22 | 2343321 | 3223455 | False | False |
66 | 55555 | 333 | True | True |
it would also be cool to get the following output:
contact | phone1_x | phone2_x | phone1_y | phone2_y | Match1 | Match2 |
---|---|---|---|---|---|---|
1 | 1234 | 12 | 12 | True | False | |
2 | 12345 | 123 | 12345 | False | True | |
33 | 4444 | 22 | 2343321 | 3223455 | False | False |
66 | 55555 | 333 | True | True |
I hope someone want to help. I am working with python pandas.
EDIT:
It's still not working for me, I guess because my code is wrong:
df = pd.read_csv('TM.csv', sep=',') #x
df1 = pd.read_csv('TM_verrijk.csv', sep=',') #y
df['phone1'] = df['phone1'].str.replace('-', '')
df['phone1'] = df['phone1'].str.replace(' ', '')
df1['phone1'] = df1['phone1'].str.replace('-', '')
df1['phone1'] = df1['phone1'].str.replace(' ', '')
df['phone2'] = df['phone2'].str.replace('-', '')
df['phone2'] = df['phone2'].str.replace(' ', '')
df1['phone2'] = df1['phone2'].str.replace('-', '')
df1['phone2'] = df1['phone2'].str.replace(' ', '')
dff = df.merge(df1, on='contact')
dff['phone1_x'] = pd.to_numeric(dff['phone1_x'], errors='coerce')
dff['phone1_y'] = pd.to_numeric(dff['phone1_y'], errors='coerce')
dff['phone2_x'] = pd.to_numeric(dff['phone2_x'], errors='coerce')
dff['phone2_y'] = pd.to_numeric(dff['phone2_y'], errors='coerce')
dff['Match?'] = np.where(dff['phone1_x'] == dff['phone1_y'], 'True', 'False')
dff['Match2?'] = np.where(dff['phone2_x'] == dff['phone2_y'], 'True', 'False')
When i run dff now I got the same dataframe as in my question:
contact | phone1_x | phone2_x | phone1_y | phone2_y | Match1 | Match2 |
---|---|---|---|---|---|---|
1 | 1234 | 12 | 1234 | True | False | |
2 | 12345 | 123 | 123 | False | True | |
33 | 4444 | 22 | 2343321 | 3223455 | False | False |
66 | 55555 | 333 | 55555 | 333 | True | True |
dff[['phone1_y','phone2_y']] = np.where(dff[['Match?','Match2?']],
np.nan,
dff[['phone1_x','phone2_x']])
This code gives everything NaN value. Also the other answers gives NaN value:
contact | phone1_x | phone2_x | phone1_y | phone2_y | Match1 | Match2 |
---|---|---|---|---|---|---|
1 | 1234 | 12 | True | False | ||
2 | 12345 | 123 | False | True | ||
33 | 4444 | 22 | False | False | ||
66 | 55555 | 333 | True | True |
Any idea why it won't work with my code?
EDIT:
Columns Match? and Match2? needs to be boolean:
dff['Match?'] = dff['Match?'].map({'False':False, 'True':True})
dff['Match2?'] = dff['Match2?'].map({'False':False, 'True':True})
Thanks everyone for the answers!!
CodePudding user response:
Let me teach you the magic of the apply function in pandas
def get_new_value(first_val,sec_val):
if sec_val:
return None
return first_val
df['phone1_y'] = df.apply(lambda x: get_new_value(x['phone1_y'],x['Match1']),axis=1)
df['phone2_y'] = df.apply(lambda x: get_new_value(x['phone2_y'],x['Match2']),axis=1)
make sure you use the axis and I am sure you will manage the last part yourself see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html for more options
CodePudding user response:
Use .loc
#Set phone1_y as NaN if Match1 == True
df.loc[df["Match1"] == True, "phone1_y"] = np.nan
#Set phone2_y as NaN if Match2 == True
df.loc[df["Match2"] == True, "phone2_y"] = np.nan
#Set phone1_y as phone1_x if phone1_y is NaN and phone1_x is not NaN
df.loc[(df["phone1_y"].isna()) & (df["phone1_x"].notna()), "phone1_y"] = df["phone1_x"]
#Set phone2_y as phone2_x if phone2_y is NaN and phone2_x is not NaN
df.loc[(df["phone2_y"].isna()) & (df["phone2_x"].notna()), "phone2_y"] = df["phone2_x"]
CodePudding user response:
Use numpy.where
- first replace True
to np.nan
, if False no replace:
df[['phone1_y','phone2_y']] = np.where(df[['Match1','Match2']],
np.nan,
df[['phone1_y','phone2_y']])
print (df)
contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
0 1 1234 12 NaN NaN True False
1 2 12345 123 NaN NaN False True
2 33 4444 22 2343321.0 3223455.0 False False
3 66 55555 333 NaN NaN True True
For second change first solution to if False
then replace by 'phone1_x','phone2_x'
:
df[['phone1_y','phone2_y']] = np.where(df[['Match1','Match2']],
np.nan,
df[['phone1_x','phone2_x']])
print (df)
contact phone1_x phone2_x phone1_y phone2_y Match1 Match2
0 1 1234 12 NaN 12.0 True False
1 2 12345 123 12345.0 NaN False True
2 33 4444 22 4444.0 22.0 False False
3 66 55555 333 NaN NaN True True