I have two df columns addr_num1
and addr_num2
like below:
addr_num1 addr_num2
10 10
20 20
33 35
40 40
50 53
I want to create a new column such that if the two values are the same, I will use one of them. If not I will combine them like below:
addr_num3
10
20
33-35
40
50-53
How can I do this? Please advise.
CodePudding user response:
Simple way using conditional statements:
s1 = df['addr_num1'].astype(str)
s2 = df['addr_num2'].astype(str)
import numpy as np
df['addr_num3'] = np.where(s1==s2, s1, s1 '-' s2)
Alternative using reshaping:
df['addr_num3'] = (df[['addr_num1', 'addr_num2']]
.astype(str)
.reset_index()
.melt(id_vars='index')
.drop_duplicates(['index', 'value'])
.groupby('index')['value'].agg('-'.join)
)
output:
addr_num1 addr_num2 addr_num3
0 10 10 10
1 20 20 20
2 33 35 33-35
3 40 40 40
4 50 53 50-53
CodePudding user response:
You could use two steps
Set all values to your first and second column as str
separated by -
, this will be for non-"matches"
Use .loc
to filter for matches and set the value to column one (as string for consistency)
df['addr_num3'] = df['addr_num1'].apply(str) '-' df['addr_num2'].apply(str)
df.loc[df['addr_num1']==df['addr_num2'],'addr_num3']=df['addr_num1'].apply(str)
loc
allows to set column values based on a condition
Pandas docs on loc
Pandas docs on apply