Home > Back-end >  get a substring using regex in pandas
get a substring using regex in pandas

Time:01-31

I am new to pandas. I am trying to fetch a multiple substring from a string. But I need to check between particular start and end.

if it is present i need to get its position, which substring. If not present print "No".

example:

Words I need to search = hi/lo position to search = 7-10

Input Output
HelloWorld NA
worldofhi 7,hi
worldoflove 8,lo

final=final.assign(Result=final.Sequence.str.find('hi'|'lo',7,10))

CodePudding user response:

Use str.replace:

target = 'hi|love'

m = df['sequence'].str.contains(target)

df.loc[m, 'output'] = (df.loc[m, 'sequence']
                         .str.replace(fr'.*({target}).*',
                                      lambda m: f'{m.start(1) 1},{m.group(1)}',
                                      regex=True)
                       )

df.loc[~m, 'output'] = 'NA'

Output:

      sequence  output
0   HelloWorld      NO
1    worldofhi    8,hi
2  worldoflove  8,love

Used input:

      sequence
0   HelloWorld
1    worldofhi
2  worldoflove

checking only in substring 7:10

target = 'hi|love'

s = df['sequence'].str[7:10 1]

m = s.str.contains(target)

df.loc[m, 'output'] = (s[m]
                         .str.replace(fr'.*({target}).*',
                                      lambda m: f'{m.start(1) 7 1},{m.group(1)}',
                                      regex=True)
                       )

df.loc[~m, 'output'] = 'NA'
  • Related