Extracting a substring based on conditions and adding into another column in dataframe-CodePudding

Hey guys my current dataframe is this

df.head()

Output: | Country | Data | | -------- | -------------- | | America | blahblahblah[@A , @b]blahblahblah | | Cuba | blahblahblahblahblahblah[@f, @f]blahblahblah |

I would like to have a code where I am able to extract the group tags in the Data Column. An example of an output is this:

Output: | Country | Data | Group Tag | | -------- | -------------- |-------------- | | America | blahblahblah[@A , @b]blahblahblah | [@A , @b] | | Cuba | blahblahblahblahblahblah[@f, @f]blahblahblah | [@f , @f] |

Thank you. Any help will be appreciated!

CodePudding user response：

Here is a working example:

df = pd.DataFrame({'Country': ['America', 'Cuba'],
                   'Data': ['blahblahblah[@A , @b]blahblahblah',
                            'blahblahblahblahblahblah[@f, @f]blahblahblah']})
df['Group Tag'] = df['Data'].apply(lambda st: st[st.find("["):st.find("]") 1])
df['Data'] = df.apply(lambda row : row['Data'].replace(str(row['Group Tag']), ''), axis=1)

CodePudding user response：

First of all, you need a function to extract the tag, you can achieve this with regex (python re module):

import re

def search_for_tag(s: str):
    groups = re.search(r"(\[.*\])", s).groups()
    return groups[0] if groups else None

It will return first found tag group or None if it can't find anything

Then use apply() method to extract values from the "Data" column:

values = df['Data'].apply(search_for_tag)

and then create a new column by assigning values to it

df['Data'] = values