Home > Blockchain >  Python Regex: removes a pattern to the right of the string in pandas column
Python Regex: removes a pattern to the right of the string in pandas column

Time:11-28

I have a dataframe with brokerage note data, and in a column [column4] I would like to remove characters to the right of a pattern:

default would be 2 letter after 3 or 2 numbers, letters would be ON or PN

coluna4
PETRA253PN 27,35PETR
PETRA299ON 27,60PETR
PETRA231PN 27,85PETR
PETRA28PN 28,10PETR
PETRA268ON 28,35PETR

I would like to obtain:

coluna4
PETRA253
PETRA299
PETRA231
PETRA28P
PETRA268

CodePudding user response:

From your example, it just seems like you could just take the first 8 chars from the cell:

df.col4 = df.col4.apply(lambda x: x[:8])

If the before last example is incorrect and you want to match until the letters, you could use a regex:

df.col4 = df.col4.apply(lambda x: re.findall(r'^PETRA\d{2,3}\D', x)[0])

CodePudding user response:

You can use pandas.Series.str.extract():

df["coluna4"].str.extract(r"^(PETRA\w{3})")

Output -

          0
0  PETRA253
1  PETRA299
2  PETRA231
3  PETRA28P
4  PETRA268
  • Related