Home > OS >  Сompare columns in dataframes and update them
Сompare columns in dataframes and update them

Time:10-24

I have two DataFrames:

a = [{'name': 'AAA'}, {'name': 'BBB'}, {'name': 'BBB'}, {'name': 'CCC'}, {'name': 'DDD'}, {'name': 'EEE'}]
df = pd.DataFrame(a)
b = [{'name': 'AAA='}, {'name': 'BBB='}, {'name': 'DDD='}]
df = pd.DataFrame(a)
df1 = pd.DataFrame(b)
print(df)
print()
print(df1)

enter image description here

I need to compare values in dataframes in columns name and name2. If the values are equal and the difference is only in the = sign, then add it if the value does not contain it.

I need the following result as in the screenshot:

enter image description here

CodePudding user response:

If the = is always in the end, a simple method could be:

# which values of df['name'] are also in df1['name']?
m = df['name'].isin(df1['name'].str.rstrip('='))

# select those and add a "="
df.loc[m, 'name']  = '='

Another approach that works with any position of the =:

s = df1['name'].set_axis(df1['name'].str.replace('=', ''))
df['name'] = df['name'].map(lambda x: s.get(s, x))

updated df:

   name
0  AAA=
1  BBB=
2  BBB=
3   CCC
4  DDD=
5   EEE
  • Related