Home > Software design >  how to extract information from dataframe column and create a new column based on information
how to extract information from dataframe column and create a new column based on information

Time:11-30

I have pandas dataframe which includes list of url like this:

api
https://apis.us/image/
https://apis.emea/video/
https://apis.asia/docs/
https://apis.general/

I want get a new column region which will tell the corresponding region of the urls, in case there is no region in the url, mark that as global.

api                         region
https://apis.us/image/      us
https://apis.emea/video/    emea
https://apis.asia/docs/     asia
https://apis.general/       global

How can I achieve this in efficient way? For all urls I have to search with this three region us, emea and asia

CodePudding user response:

If need test values after apis. text use Series.str.extract with first a positive lookbehind with joined possible values in list with replace not matched values by Series.fillna:

vals = ['us','emea','asia']
df['region'] = (df['api'].str.extract(rf'(?<=https://apis\.)({"|".join(vals)})')
                         .fillna('global'))

print (df)
                        api  region
0    https://apis.us/image/      us
1  https://apis.emea/video/    emea
2   https://apis.asia/docs/    asia
3     https://apis.general/  global

If need test any substring:

vals = ['us','emea','asia']
df['region'] = df['api'].str.extract(rf'({"|".join(vals)})').fillna('global')
  • Related