Home > Back-end >  Check if a column contains data from another column in python pandas
Check if a column contains data from another column in python pandas

Time:04-21

I have a dataframe in pandas like this

name     url
pau lola www.paulola.com
pou gine www.cheeseham.com
pete raj www.pataraj.com

And I want to check if any of the strings in the column name are in the column url (so ignoring spaces). So something like this

name     url                result
pau lola www.paulola.com    True
pou gine www.cheeseham.com  False
pete raj www.pataraj.com    True

Is there any way to do it? I've tried to do with this lambda function but only works if contains both

name     url               namewospaces
pau lola www.paulola.com   paulola
pou gine www.cheeseham.com pougine
pete raj www.pataraj.com   peteraj

df['result'] = df.apply(lambda x: str(x.namewospaces) in str(x.url), axis=1)

name     url               namewospaces  result
pau lola www.paulola.com   paulola       True
pou gine www.cheeseham.com pougine       False
pete raj www.pataraj.com   peteraj       False

Thank you all :)

CodePudding user response:

split the name into substrings, and use a list comprehension with any to get True is any string matches:

df['result'] = [any(s in url for s in lst)
                for lst, url in zip(df['name'].str.split(), df['url'])]

the (slower) equivalent with apply would be:

df['result'] = df.apply(lambda x: any(s in x['url']
                        for s in x['name'].split()), axis=1)

output:

       name                url  result
0  pau lola    www.paulola.com    True
1  pou gine  www.cheeseham.com   False
2  pete raj    www.pataraj.com    True
  • Related