Hey guys my current dataframe is this
df.head()
Output: | Country | Data | | -------- | -------------- | | America | blahblahblah[@A , @b]blahblahblah | | Cuba | blahblahblahblahblahblah[@f, @f]blahblahblah |
I would like to have a code where I am able to extract the group tags in the Data Column. An example of an output is this:
Output: | Country | Data | Group Tag | | -------- | -------------- |-------------- | | America | blahblahblah[@A , @b]blahblahblah | [@A , @b] | | Cuba | blahblahblahblahblahblah[@f, @f]blahblahblah | [@f , @f] |
Thank you. Any help will be appreciated!
CodePudding user response:
Here is a working example:
df = pd.DataFrame({'Country': ['America', 'Cuba'],
'Data': ['blahblahblah[@A , @b]blahblahblah',
'blahblahblahblahblahblah[@f, @f]blahblahblah']})
df['Group Tag'] = df['Data'].apply(lambda st: st[st.find("["):st.find("]") 1])
df['Data'] = df.apply(lambda row : row['Data'].replace(str(row['Group Tag']), ''), axis=1)
CodePudding user response:
First of all, you need a function to extract the tag, you can achieve this with regex (python re module):
import re
def search_for_tag(s: str):
groups = re.search(r"(\[.*\])", s).groups()
return groups[0] if groups else None
It will return first found tag group or None if it can't find anything
Then use apply() method to extract values from the "Data" column:
values = df['Data'].apply(search_for_tag)
and then create a new column by assigning values to it
df['Data'] = values