I'm trying to compare two different columns from two different DataFrames.
test_website1
Domain
0 www.google.com
1 www.facebook.com
2 www.yahoo.com
test_website2
Domain
0 www.bing.com
1 www.instagram.com
2 www.google.com
testlist = []
def match_domain(col1, col2):
for a in col1:
v1 = a
for b in col2:
v2 = b
if v2 == v1:
testlist.append(v1)
test_website1 = df_website_test1['Domain']
test_website2 = df_website_test2['Domain']
When I call:
match_domain(test_website1, test_website2)
The output should be just "www.google.com"
but instead this is the output I get.
['www.google.com', 'www.facebook.com', 'www.yahoo.com']
Completely Stuck! Thank you for your help in advance!
CodePudding user response:
The standard way to do this in pandas is an inner merge
(default is how='inner'
):
pd.merge(df1['Domain'], df2['Domain'])
# Domain
# 0 www.google.com
If you really want a list, chain tolist
:
pd.merge(df1['Domain'], df2['Domain'])['Domain'].tolist()
# ['www.google.com']
CodePudding user response:
Instead of the for-loop, it seems you could use set.intersection
instead:
def match_domain(col1, col2):
return list(set(col1) & set(col2))
Output:
>>> match_domain(df_website_test1['Domain'], df_website_test2['Domain'])
['www.google.com']