I have two DataFrames:
a = [{'name': 'AAA'}, {'name': 'BBB'}, {'name': 'BBB'}, {'name': 'CCC'}, {'name': 'DDD'}, {'name': 'EEE'}]
df = pd.DataFrame(a)
b = [{'name': 'AAA='}, {'name': 'BBB='}, {'name': 'DDD='}]
df = pd.DataFrame(a)
df1 = pd.DataFrame(b)
print(df)
print()
print(df1)
I need to compare values in dataframes in columns name and name2. If the values are equal and the difference is only in the = sign, then add it if the value does not contain it.
I need the following result as in the screenshot:
CodePudding user response:
If the =
is always in the end, a simple method could be:
# which values of df['name'] are also in df1['name']?
m = df['name'].isin(df1['name'].str.rstrip('='))
# select those and add a "="
df.loc[m, 'name'] = '='
Another approach that works with any position of the =
:
s = df1['name'].set_axis(df1['name'].str.replace('=', ''))
df['name'] = df['name'].map(lambda x: s.get(s, x))
updated df
:
name
0 AAA=
1 BBB=
2 BBB=
3 CCC
4 DDD=
5 EEE