I have a pandas dataframe with a bunch of urls in a column, eg
URL
www.myurl.com/python/us/learnpython
www.myurl.com/python/en/learnpython
www.myurl.com/python/fr/learnpython
.........
I want to extract the country code and add them in to a new column called Country containing us, en, fr and so on. I'm able to do this on a single string, eg
url = 'www.myurl.com/python/us/learnpython'
country = url.split("python/")
country = country[1]
country = country.split("/")
country = country[0]
How do I go about applying this to the entire column, creating a new column with the required data in the process? I've tried variations of this with a for loop without success.
CodePudding user response:
Assuming the URLs would always have this format, we can just use str.extract
here:
df["cc_code"] = df["URL"].str.extract(r'/([a-z]{2})/')
CodePudding user response:
If the contry code always appears after second slash /
, its better to just split the string passing value for n
i.e. maxsplit parameter and take only the value you are interested in. Of course, you can assign the values to a new column:
>>> df['URL'].str.split('/',n=2).str[-1].str.split('/', n=1).str[0]
0 us
1 en
2 fr
Name: URL, dtype: object