How to use str.contains() in a conditional statement to apply a function to some elements of a dataf-CodePudding

I have a column in a pandas dataframe that holds various URLs to websites:

df:
    ID   URL
0   1    https://www.Facebook.com/fr
1   2    https://Twitter.com/de
2   3    https://www.Youtube.com
3   4    www.Microsoft.com
4   5    https://www.Stackovervlow.com

I am using urlparse().netloc to clean the URLs to only have the domain names (e.g., from https://www.Facebook.com/fr to www.Facebook.com). Some of the URLs are already in a clean format (www.Microsoft.com above), and applying urlparse().netloc to these clean URLs results in an empty cell. Therefore, I am trying to apply the urlparse().netloc function to elements of the URL column where the element contains the string 'http', else it should return the original URL. Here is the code I have be trying to use:

df['URL'] = df['URL'].apply(
    lambda x: urlparse(x).netloc if x.str.contains("http", na=False) else x
)

However, I get this error message: AttributeError: 'str' object has no attribute 'str'. Any help on how I can overcome this to complete the task would be much appreciated!

CodePudding user response：

You are using pandas.Series.apply therefore your function (lambda) receives element (str) itself, so you might simply us in as follows

df['URL'] = df['URL'].apply(
    lambda x: urlparse(x).netloc if "http" in x else x
)

CodePudding user response：

x is already a string not the Series. So use x.find:

df['URL'] = df['URL'].apply(
    lambda x: urlparse(x).netloc if x.find("http") != -1 else x
)
print(df)

# Output:
   ID                    URL
0   1       www.Facebook.com
1   2            Twitter.com
2   3        www.Youtube.com
3   4      www.Microsoft.com
4   5  www.Stackovervlow.com

But you can use str.extract to get netloc:

df['URL'] = df['URL'].str.extract(r'(?:^https?://)?([^/] )', expand=False)
print(df)

# Output:
   ID                    URL
0   1       www.Facebook.com
1   2            Twitter.com
2   3        www.Youtube.com
3   4      www.Microsoft.com
4   5  www.Stackovervlow.com