I have a dataframe and I want to extract a specific part of it into a new column. I believe this should be easy to do so using regular expression. The dataframe I have is like this:
A
0 data-GR1
1 GR2-data
2 data_GR3_data
the desired output is to get all the GR IDs in another columns such as :
A B
0 data-GR1 GR1
1 GR2-data GR2
2 data_GR3_data GR3
I think the best way to do this is sort of df["B"] = df["A"].str.extract(regular expression)
Any help on how to do this?
CodePudding user response:
Use str.extract
:
df['B'] = df['A'].str.extract('(GR\d )', expand=False)
print(df)
# Output
A B
0 data-GR1 GR1
1 GR2-data GR2
2 data_GR3_data GR3
CodePudding user response:
Use named groups
df=df.join(df['A'].str.extract(r'(?P<B>[GR\d] )'))
A B
0 data-GR1 GR1
1 GR2-data GR2
2 data_GR3_data GR3