I have a dataframe with brokerage note data, and in a column
[column4]
I would like to remove characters to the right of a pattern:
default would be 2
letter after 3
or 2
numbers, letters would be ON
or PN
coluna4 |
---|
PETRA253PN 27,35PETR |
PETRA299ON 27,60PETR |
PETRA231PN 27,85PETR |
PETRA28PN 28,10PETR |
PETRA268ON 28,35PETR |
I would like to obtain:
coluna4 |
---|
PETRA253 |
PETRA299 |
PETRA231 |
PETRA28P |
PETRA268 |
CodePudding user response:
From your example, it just seems like you could just take the first 8 chars from the cell:
df.col4 = df.col4.apply(lambda x: x[:8])
If the before last example is incorrect and you want to match until the letters, you could use a regex:
df.col4 = df.col4.apply(lambda x: re.findall(r'^PETRA\d{2,3}\D', x)[0])
CodePudding user response:
You can use pandas.Series.str.extract():
df["coluna4"].str.extract(r"^(PETRA\w{3})")
Output -
0
0 PETRA253
1 PETRA299
2 PETRA231
3 PETRA28P
4 PETRA268