Home > front end >  Comparing 2 pandas dataframe columns and creating new column based on if the values are same or not
Comparing 2 pandas dataframe columns and creating new column based on if the values are same or not

Time:05-25

I have two df columns addr_num1 and addr_num2 like below:

addr_num1  addr_num2
   10          10
   20          20
   33          35
   40          40
   50          53

I want to create a new column such that if the two values are the same, I will use one of them. If not I will combine them like below:

addr_num3
   10
   20
  33-35
   40
  50-53

How can I do this? Please advise.

CodePudding user response:

Simple way using conditional statements:

s1 = df['addr_num1'].astype(str)
s2 = df['addr_num2'].astype(str)

import numpy as np
df['addr_num3'] = np.where(s1==s2, s1, s1 '-' s2)

Alternative using reshaping:

df['addr_num3'] = (df[['addr_num1', 'addr_num2']]
 .astype(str)
 .reset_index()
 .melt(id_vars='index')
 .drop_duplicates(['index', 'value'])
 .groupby('index')['value'].agg('-'.join)
)

output:

   addr_num1  addr_num2 addr_num3
0         10         10        10
1         20         20        20
2         33         35     33-35
3         40         40        40
4         50         53     50-53

CodePudding user response:

You could use two steps

Set all values to your first and second column as str separated by -, this will be for non-"matches"

Use .loc to filter for matches and set the value to column one (as string for consistency)

df['addr_num3'] = df['addr_num1'].apply(str) '-' df['addr_num2'].apply(str)
df.loc[df['addr_num1']==df['addr_num2'],'addr_num3']=df['addr_num1'].apply(str)

loc allows to set column values based on a condition

Pandas docs on loc

Pandas docs on apply

  • Related