Home > Net >  How can i solve my this regex logical error?
How can i solve my this regex logical error?

Time:10-26

Code -

df['Expiry'], df['Symbol'] = None, None
index_Ticker = df.columns.get_loc('Ticker')
index_Expiry = df.columns.get_loc('Expiry')
index_Symbol = df.columns.get_loc('Symbol')
            
Expiry_Pattern = r'-([A-Z]{1,3})'
Symbol_Pattern = r'(.*?)-[A-Z]{1,3}'
            
for row in range(0, len(df)):
    Expiry = re.search(Expiry_Pattern, df.iat[row, index_Ticker]).group()
    df.iat[row, index_Expiry] = Expiry
    Symbol = re.search(Symbol_Pattern, df.iat[row, index_Ticker]).group()
    df.iat[row, index_Symbol] = Symbol

here I'm using this regex

Expiry_Pattern = r'-([A-Z]{1,3})'
Symbol_Pattern = r'(.*?)-[A-Z]{1,3}'

And my output is - Output Image

And My actual data is in this format -

ZEEL-III.NFO
RELIANCE-III.NFO
ADANIPORTS-I.NFO
ZEEL-II.
AARTIIND-III.NFO

but I want output -

ZEEL         III
RELIANCE     III
ADANIPORTS   I
ZEEL         II
AARTIIND     III

I don't understand how can I solve this issue.

CodePudding user response:

You can use the regex '-?(\w )(?=-|\.)' to get the expected output for the sample data you have:

>>> df['col'].str.findall('-?(\w )(?=-|\.)').apply(pd.Series)

            0    1
0        ZEEL  III
1    RELIANCE  III
2  ADANIPORTS    I
3        ZEEL   II
4    AARTIIND  III`

Pattern Explanation:

'-?(\w )(?=-|\.)'

  • -? will match one or zero occurrence of hyphen - in the beginning
  • (\w ) captures the word/substring
  • (?=-|\.) is positive lookahead to make sure it ends with - or .

The Non-regex solution:

Right split the string first on . with maxsplit n as 1, then take the value at first index, and split it on -:

df['col'].str.rsplit('.', n=1).str[:-1].str[0].str.split('-').apply(pd.Series)
            0    1
0        ZEEL  III
1    RELIANCE  III
2  ADANIPORTS    I
3        ZEEL   II
4    AARTIIND  III

CodePudding user response:

I extract value -

df["Symbol"] = df["Ticker"].str.extract('(.*?)-').apply(pd.Series)
df["Expiry"] = df["Ticker"].str.extract('-([A-Z]{1,3})').apply(pd.Series)

and create two columns.

now my Output is also the same as I want. Output Image

  • Related