I have a column in a data frame which contains statistical numbers like these
percentage |
---|
32/40 (80%) |
56/60 (93%) |
how to keep only the 80% and 90% and remove the total number, I'm a beginner at pandas
CodePudding user response:
You could use str.extract
here:
df["percentage"] = df["percentage"].str.extract(r'(\d (?:\.\d )?%)')
CodePudding user response:
A non regex approach would be to
df.col1.str.rsplit('(').str[1].str.rstrip(')'))
The logic here is that you split your string than grab the percent values and them, strip the unwonted ")" part of the string. Not the most optimal way but it is more readable.