So I have two different Dataframes, both dataframes have columns known as "Domains" and show domains from different sources. The domains are clean and look like
CodePudding user response:
As well as merge
and isin
, you can also use set.intersection
:
out = [*set(df1['Domains']) & set(df2['Domains'])]
out = pd.merge(df1['Domains'], df2['Domains'])['Domains'].tolist()
It's probably the fastest way to do your task. Here are the runtime comparison for your data:
merge
:
2.85 ms ± 354 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
isin
:
347 µs ± 26.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
set.intersection
:
16.9 µs ± 1.99 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
set.intersection
is ~168x faster than merge
.