I have a sentence like this Give me 4 of ABCD_X and then do something
I need to extract ABCD_X
- any set of characters after Give me 4 of
and before space.
Number can be any size
I am able to do it with this expression (taken from this question):
(?<=^Give me \d of )(.*?)(?=\s)
But the number can be 10 or greater, so
(?<=^Give me \d of )(.*?)(?=\s)
returns error in python (pandas column) that positive lookbehind should be fixed width.
Is there a way to avoid positive lookbehind to exract those characters?
CodePudding user response:
You could try:
^Give me \d of (\S )
See an online demo
^
- Start line anchor.Give me \d of
- Literally your searchstring with 1 digits.(\S )
- A capture group with 1 non-whitespace characters.
For example:
import pandas as pd
df = pd.Series(['Give me 4 of ABCD_X and then do something', 'Give me 10 of ABCD_Y and then do something'])
df = df.str.extract(r'^Give me \d of (\S )')
print(df)
Prints:
0
0 ABCD_X
1 ABCD_Y
Note: If you would use a named capture group, the column header will use the name of that group instead of the integer of the group.