Home > other >  How to use str.contains() in a conditional statement to apply a function to some elements of a dataf
How to use str.contains() in a conditional statement to apply a function to some elements of a dataf

Time:10-21

I have a column in a pandas dataframe that holds various URLs to websites:

df:
    ID   URL
0   1    https://www.Facebook.com/fr
1   2    https://Twitter.com/de
2   3    https://www.Youtube.com
3   4    www.Microsoft.com
4   5    https://www.Stackovervlow.com

I am using urlparse().netloc to clean the URLs to only have the domain names (e.g., from https://www.Facebook.com/fr to www.Facebook.com). Some of the URLs are already in a clean format (www.Microsoft.com above), and applying urlparse().netloc to these clean URLs results in an empty cell. Therefore, I am trying to apply the urlparse().netloc function to elements of the URL column where the element contains the string 'http', else it should return the original URL. Here is the code I have be trying to use:

df['URL'] = df['URL'].apply(
    lambda x: urlparse(x).netloc if x.str.contains("http", na=False) else x
)

However, I get this error message: AttributeError: 'str' object has no attribute 'str'. Any help on how I can overcome this to complete the task would be much appreciated!

CodePudding user response:

You are using pandas.Series.apply therefore your function (lambda) receives element (str) itself, so you might simply us in as follows

df['URL'] = df['URL'].apply(
    lambda x: urlparse(x).netloc if "http" in x else x
)

CodePudding user response:

x is already a string not the Series. So use x.find:

df['URL'] = df['URL'].apply(
    lambda x: urlparse(x).netloc if x.find("http") != -1 else x
)
print(df)

# Output:
   ID                    URL
0   1       www.Facebook.com
1   2            Twitter.com
2   3        www.Youtube.com
3   4      www.Microsoft.com
4   5  www.Stackovervlow.com

But you can use str.extract to get netloc:

df['URL'] = df['URL'].str.extract(r'(?:^https?://)?([^/] )', expand=False)
print(df)

# Output:
   ID                    URL
0   1       www.Facebook.com
1   2            Twitter.com
2   3        www.Youtube.com
3   4      www.Microsoft.com
4   5  www.Stackovervlow.com

  • Related