Python Regex: removes a pattern to the right of the string in pandas column-CodePudding

I have a dataframe with brokerage note data, and in a column [column4] I would like to remove characters to the right of a pattern:

default would be 2 letter after 3 or 2 numbers, letters would be ON or PN

I would like to obtain:

CodePudding user response：

From your example, it just seems like you could just take the first 8 chars from the cell:

df.col4 = df.col4.apply(lambda x: x[:8])

If the before last example is incorrect and you want to match until the letters, you could use a regex:

df.col4 = df.col4.apply(lambda x: re.findall(r'^PETRA\d{2,3}\D', x)[0])

CodePudding user response：

You can use pandas.Series.str.extract():

df["coluna4"].str.extract(r"^(PETRA\w{3})")

Output -

          0
0  PETRA253
1  PETRA299
2  PETRA231
3  PETRA28P
4  PETRA268