I have pandas dataframe which includes list of url like this:
api
https://apis.us/image/
https://apis.emea/video/
https://apis.asia/docs/
https://apis.general/
I want get a new column region
which will tell the corresponding region of the urls, in case there is no region in the url, mark that as global
.
api region
https://apis.us/image/ us
https://apis.emea/video/ emea
https://apis.asia/docs/ asia
https://apis.general/ global
How can I achieve this in efficient way? For all urls I have to search with this three region us, emea and asia
CodePudding user response:
If need test values after apis.
text use Series.str.extract
with first a positive lookbehind with joined possible values in list with replace not matched values by Series.fillna
:
vals = ['us','emea','asia']
df['region'] = (df['api'].str.extract(rf'(?<=https://apis\.)({"|".join(vals)})')
.fillna('global'))
print (df)
api region
0 https://apis.us/image/ us
1 https://apis.emea/video/ emea
2 https://apis.asia/docs/ asia
3 https://apis.general/ global
If need test any substring:
vals = ['us','emea','asia']
df['region'] = df['api'].str.extract(rf'({"|".join(vals)})').fillna('global')