Home > Software design >  How to find the intersection between two columns from two different dataframes
How to find the intersection between two columns from two different dataframes

Time:02-15

I'm trying to compare two different columns from two different DataFrames.

test_website1 
          Domain
0      www.google.com
1    www.facebook.com
2       www.yahoo.com

test_website2 
          Domain
0      www.bing.com
1    www.instagram.com
2       www.google.com

testlist = []

def match_domain(col1, col2):
    for a in col1:
        v1 = a
        for b in col2:
            v2 = b
            if v2 == v1:
                testlist.append(v1)

test_website1 = df_website_test1['Domain']
test_website2 = df_website_test2['Domain']

When I call:

match_domain(test_website1, test_website2)

The output should be just "www.google.com"

but instead this is the output I get.

['www.google.com', 'www.facebook.com', 'www.yahoo.com']

Completely Stuck! Thank you for your help in advance!

CodePudding user response:

The standard way to do this in pandas is an inner merge (default is how='inner'):

pd.merge(df1['Domain'], df2['Domain'])

#            Domain
# 0  www.google.com

If you really want a list, chain tolist:

pd.merge(df1['Domain'], df2['Domain'])['Domain'].tolist()

# ['www.google.com']

CodePudding user response:

Instead of the for-loop, it seems you could use set.intersection instead:

def match_domain(col1, col2):
    return list(set(col1) & set(col2))

Output:

>>> match_domain(df_website_test1['Domain'], df_website_test2['Domain'])
['www.google.com']
  • Related