Home > Mobile >  How to remove domain of a websites on pandas dataframe
How to remove domain of a websites on pandas dataframe

Time:05-21

Here's the dataset

Id  Websites
1   facebook.com
2   linked.in
3   stackoverflow.com
4   harvard.edu
5   ugm.ac.id

Heres's my expected output

Id  Name
1   facebook
2   linked
3   stackoverflow
4   harvard
5   ugm

CodePudding user response:

You can use a regex to get the part before the first dot, combined with pop to remove the Website column:

df['Name'] = df.pop('Websites').str.extract('([^.] )')

output:

   Id           Name
0   1       facebook
1   2         linked
2   3  stackoverflow
3   4        harvard
4   5            ugm

CodePudding user response:

You can split the name by "." and take what appears before the first .

df['Names'] = df['Websites'].str.split('.').str[0]

Output:

Id  Websites              Names
1   facebook.com          facebook
2   linked.in             linked 
3   stackoverflow.com     stackoverflow
4   harvard.edu           harvard
5   ugm.ac.id             ugm

CodePudding user response:

Can make use of rsplit to split by the last occurrence of ".". Next part will be extracting out the domain name. Such that when cases like <abc.cde.com> occurs, it will return <abc.cde>

df['Name'].str.rsplit('.', 1).str[0]
  • Related