Home > Enterprise >  Regex to capture entire string in python panda series
Regex to capture entire string in python panda series

Time:03-05

I have a sample series:

s = pd.Series(['Complexity Level 1', 'RandomName', 'I-Invoice Submission test', 'I-test2', 'I-string with multiple words'])

I'm trying to capture only strings that begin with "I-". Using extract.

extract1 = s.str.extract(r'I-(\w )')

Current Output:

    0
0   NaN
1   NaN
2   Invoice
3   test2
4   string

It's currently only extracting the first word. But I want all words and white space after the identifier. This could be up to 5 words

Is this a regex adjustment or is there a better method?

What I want is:

    0
0   NaN
1   NaN
2   Invoice Submission test
3   test2
4   string with multiple words

CodePudding user response:

The regex that will do the job is r'I-(.*)'?. Meaning: capture any character (until a newline) after "|-".


EDIT (From comments):

To capture any character up until a comma use I-([^,]*). Meaning: capture any character that is not a comma (,) after "|-".

  • Related