Home > OS >  RegEx, characters between 2 strings without positive lookbehind
RegEx, characters between 2 strings without positive lookbehind

Time:09-21

I have a sentence like this Give me 4 of ABCD_X and then do something

I need to extract ABCD_X - any set of characters after Give me 4 of and before space. Number can be any size

I am able to do it with this expression (taken from this question):

(?<=^Give me \d of )(.*?)(?=\s) But the number can be 10 or greater, so (?<=^Give me \d of )(.*?)(?=\s) returns error in python (pandas column) that positive lookbehind should be fixed width.

Is there a way to avoid positive lookbehind to exract those characters?

CodePudding user response:

You could try:

^Give me \d  of (\S )

See an online demo


  • ^ - Start line anchor.
  • Give me \d of - Literally your searchstring with 1 digits.
  • (\S ) - A capture group with 1 non-whitespace characters.

For example:

import pandas as pd
df = pd.Series(['Give me 4 of ABCD_X and then do something', 'Give me 10 of ABCD_Y and then do something'])
df = df.str.extract(r'^Give me \d  of (\S )')
print(df)

Prints:

   0
0  ABCD_X
1  ABCD_Y

Note: If you would use a named capture group, the column header will use the name of that group instead of the integer of the group.

  • Related