Home > Mobile >  Matching two data different columns from two different dataframes
Matching two data different columns from two different dataframes

Time:02-15

So I have two different Dataframes, both dataframes have columns known as "Domains" and show domains from different sources. The domains are clean and look like enter image description here

CodePudding user response:

As well as merge and isin, you can also use set.intersection:

out = [*set(df1['Domains']) & set(df2['Domains'])]

out = pd.merge(df1['Domains'], df2['Domains'])['Domains'].tolist()

It's probably the fastest way to do your task. Here are the runtime comparison for your data:

merge:

2.85 ms ± 354 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

isin:

347 µs ± 26.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

set.intersection:

16.9 µs ± 1.99 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

set.intersection is ~168x faster than merge.

  • Related