Hi I am working on python. I created a dataframe from a csv file. One column "name" which is a text column, has inside in different places this pattern ' (some_number %)', example:
"145 wefwignweon (100%), 1rberbebe (50%), vwrbvwrbe (100%), 140 ewggrrwrg"
I need to delete from this column where says: ' (100%)', '(100%), '(50%') In other columns are different percentage values
import pandas as pd
path_to_dir="/Users/user/Documents/file/"
name='owner.csv'
df_owner = pd.read_csv(path_to_dir name, encoding='windows-1252')
#df_owner["name"] = df_owner["name"] drop where says => (' (@some_number%)')
How I can create like a kind of regular expression to drop where find this kind of values something like this? delete where says '( some_number %)' in name column from df_owner dataframe
Regards
CodePudding user response:
You can use the regular expression \(\d %\)
:
df = df[~df['name'].str.contains(r' \(\d %\)', regex=True)]
CodePudding user response:
Capture all numbers up to three digits gives r'\d{1,3}'
But you also seem to want the parentheses, and they and the percentage sign have to be escaped, so that will be r'\(\d{1,3}\)\%'
. You can then replace occurrences of that regex with the null string with lambda x: re.sub(r'\(\d{1,3}\)\%', '', x)
. You also might want to add the leading space to the regex.