I've encountered a problem which I didn't succeed to solve, for now. Your assistance on this one would be highly appreciated.
I have 2 dataframes:
first_df:
A B C D
1 1 a q zz
2 2 b w xx
3 3 c e yy
4 4 d r vv
second_df:
C1 C2
1 10 a
2 20 b
3 70 g
4 80 h
What I want to achieve is mark rows from first_df, based on values from second_df. Marking those rows should be based on comparing column values:
(first_df['A'] == second_df['C1'] * 10) & (first_df['B'] == second_df['C2'])
Expected output should be like this:
compared_df:
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
Could you please point me in a right direction? Which pandas methods should I use to achieve my goal? I've tried many things, but I'm pandas beginner, so it's hard to assess if those attempts were correct.
CodePudding user response:
First create Multiindex on both the dataframes then use MultiIndex.isin
to test for the occurrence of the index values of first dataframe in the index of second dataframe in order the create boolean flag:
i1 = first_df.set_index([first_df['A'] * 10, 'B']).index
i2 = second_df.set_index(['C1', 'C2']).index
first_df['Match'] = i1.isin(i2)
Result
print(first_df)
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
CodePudding user response:
The code you wrote is almost completely correct; I think you just multiplied the wrong column by 10
:
first_df['Match'] = (first_df['A'] * 10 == second_df['C1']) & (first_df['B'] == second_df['C2'])
Output:
>>> first_df
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
CodePudding user response:
You can use Series.isin
:
In [2516]: first_df['Match'] = first_df['A'].mul(10).isin(second_df['C1'])
In [2517]: df1
Out[2517]:
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
CodePudding user response:
If you don't mind writing it as sql code and you can import libraries to your project, you might use pandasql library.
import pandasql as psql
compared_df = psql.sqldf("""
select f.A, f.B, f.C, f.D,
CASE WHEN s.C1 is not null then 1 else 0 end as Match
FROM first_df f
LEFT JOIN second_df s ON (f.A *10 = s.C1 AND f.B = s.C2)
""")
CodePudding user response:
I think you could use merge
here where you use the indicator
parameter to see if a match is made or not.
out = df1.assign(A=df1['A']*10).merge(df2, left_on=['A','B'], right_on=['C1','C2'],
how='left', indicator=True)
out['Match'] = out['_merge'].replace({'both':True, 'left_only':False})
out = out.drop(columns=['C1','C2','_merge'])
Output:
A B C D Match
0 10 a q zz True
1 20 b w xx True
2 30 c e yy False
3 40 d r vv False