Home > Software design >  Regular expresssion
Regular expresssion

Time:12-27

I have a dataframe and I want to extract a specific part of it into a new column. I believe this should be easy to do so using regular expression. The dataframe I have is like this:

       A
0   data-GR1
1   GR2-data
2   data_GR3_data

the desired output is to get all the GR IDs in another columns such as :

        A          B
0   data-GR1      GR1
1   GR2-data      GR2
2   data_GR3_data GR3

I think the best way to do this is sort of df["B"] = df["A"].str.extract(regular expression)

Any help on how to do this?

CodePudding user response:

Use str.extract:

df['B'] = df['A'].str.extract('(GR\d )', expand=False)
print(df)

# Output
               A    B
0       data-GR1  GR1
1       GR2-data  GR2
2  data_GR3_data  GR3

CodePudding user response:

Use named groups

df=df.join(df['A'].str.extract(r'(?P<B>[GR\d] )'))

     

             A     B
0       data-GR1  GR1
1       GR2-data  GR2
2  data_GR3_data  GR3
  • Related