Home > Net >  Python remove part of the string from column in a dataframe
Python remove part of the string from column in a dataframe

Time:03-19

Hi I am working on python. I created a dataframe from a csv file. One column "name" which is a text column, has inside in different places this pattern ' (some_number %)', example:

"145 wefwignweon (100%), 1rberbebe (50%), vwrbvwrbe (100%), 140 ewggrrwrg"

I need to delete from this column where says: ' (100%)', '(100%), '(50%') In other columns are different percentage values

import pandas as pd

path_to_dir="/Users/user/Documents/file/"
name='owner.csv'
df_owner = pd.read_csv(path_to_dir name, encoding='windows-1252') 
#df_owner["name"] =  df_owner["name"] drop where says => (' (@some_number%)')

How I can create like a kind of regular expression to drop where find this kind of values something like this? delete where says '( some_number %)' in name column from df_owner dataframe

Regards

CodePudding user response:

You can use the regular expression \(\d %\):

df = df[~df['name'].str.contains(r' \(\d %\)', regex=True)]

CodePudding user response:

Capture all numbers up to three digits gives r'\d{1,3}'

But you also seem to want the parentheses, and they and the percentage sign have to be escaped, so that will be r'\(\d{1,3}\)\%'. You can then replace occurrences of that regex with the null string with lambda x: re.sub(r'\(\d{1,3}\)\%', '', x). You also might want to add the leading space to the regex.

  • Related