How to remove domain of a websites on pandas dataframe-CodePudding

Here's the dataset

Id  Websites
1   facebook.com
2   linked.in
3   stackoverflow.com
4   harvard.edu
5   ugm.ac.id

Heres's my expected output

Id  Name
1   facebook
2   linked
3   stackoverflow
4   harvard
5   ugm

CodePudding user response：

You can use a regex to get the part before the first dot, combined with pop to remove the Website column:

df['Name'] = df.pop('Websites').str.extract('([^.] )')

output:

   Id           Name
0   1       facebook
1   2         linked
2   3  stackoverflow
3   4        harvard
4   5            ugm

CodePudding user response：

You can split the name by "." and take what appears before the first .

df['Names'] = df['Websites'].str.split('.').str[0]

Output:

Id  Websites              Names
1   facebook.com          facebook
2   linked.in             linked 
3   stackoverflow.com     stackoverflow
4   harvard.edu           harvard
5   ugm.ac.id             ugm

CodePudding user response：

Can make use of rsplit to split by the last occurrence of ".". Next part will be extracting out the domain name. Such that when cases like <abc.cde.com> occurs, it will return <abc.cde>

df['Name'].str.rsplit('.', 1).str[0]