Home > Software engineering >  Extract substring from string and apply to entire dataframe column
Extract substring from string and apply to entire dataframe column

Time:09-18

I have a pandas dataframe with a bunch of urls in a column, eg

URL
www.myurl.com/python/us/learnpython
www.myurl.com/python/en/learnpython
www.myurl.com/python/fr/learnpython
.........

I want to extract the country code and add them in to a new column called Country containing us, en, fr and so on. I'm able to do this on a single string, eg

url = 'www.myurl.com/python/us/learnpython'
country = url.split("python/")
country = country[1]
country = country.split("/")
country = country[0]

How do I go about applying this to the entire column, creating a new column with the required data in the process? I've tried variations of this with a for loop without success.

CodePudding user response:

Assuming the URLs would always have this format, we can just use str.extract here:

df["cc_code"] = df["URL"].str.extract(r'/([a-z]{2})/')

CodePudding user response:

If the contry code always appears after second slash /, its better to just split the string passing value for n i.e. maxsplit parameter and take only the value you are interested in. Of course, you can assign the values to a new column:

>>> df['URL'].str.split('/',n=2).str[-1].str.split('/', n=1).str[0]

0    us
1    en
2    fr
Name: URL, dtype: object
  • Related