Python(Pandas) - Which Regex Syntax should i use here?-CodePudding

Here i'm using regex (regular expression) in Pnadas.

   NIFTY30DEC219000CE.NFO
    NIFTY30DEC2116000CE.NFO
    NIFTY30DEC2116000CE
    NIFTY30DEC2116111PE
    NIFTY30DEC218100PE

I have this type of string, in which '9000' which is 4 digit character or 16000 or 5 digit character, as same as more. And Output should be-

And I don't need this 30DEC21 in output.

Syntax image - which I'm using. And I'm using this syntax. but I'm getting wrong output.

This is my code - image of My Code

CodePudding user response：

I would use str.extract with the following regex pattern:

\d{2}[A-Z]{3}\d{2}(\d )

Python script:

df["output"] = df["col"].str.extract(r'\d{2}[A-Z]{3}\d{2}(\d )')

Here is a demo showing that the extraction logic is working.

CodePudding user response：

r"NIFTY30DEC21(\d{4,5})(CE\.NFO|CE|PE)"