Home > Software engineering >  Python Pandas Dataframe drop columns if string contains special character
Python Pandas Dataframe drop columns if string contains special character

Time:09-15

I have a dataframe:

Product Storage Price
Azure (2.4% Server £540
AWS Server £640
GCP Server £540

I would like to remove the column which contains the string '(2.4%' however I only want to remove the column in Pandas through regex if regex finds either a bracket or percentage in the string in that column '(%' and then pandas should drop that column entirely.

Please can you help me find a way to use regex to search for special characters within a string and drop the column if that condition is met?

I've searched on stack/google. I've used the following so far:

df = df.drop([col for col in df.columns if df[col].eq('(%').any()], axis=1)

chars = '(%'
regex = f'[{"".join(map(re.escape, chars))}]'

df = df.loc[:, ~df.apply(lambda c: c.str.contains(regex).any())]

however neither of these worked.

Any help would be greatly appreciated. :)

Thank You * Insert Smiley*

CodePudding user response:

you re using eq function it check exactly if the value in the columun match % instead of eq do this

df.drop([col for col in df.columns if df[col].apply(lambda x:'(%' in str(x)).any()], axis=1,inplace=True)

CodePudding user response:

I would do something like this

import pandas as pd
from io import StringIO

text = """
Product,Perc,Storage,Price
Azure,(2.4%,Server,£540
AWS,,Server,£640
GCP,,Server,£540
"""
data = pd.read_csv(StringIO(text))
print(data)

drop_columns = list()
for col_name in data.columns:
    has_special_characters = data[col_name].str.contains("[\(%]")
    if has_special_characters.any():
        drop_columns.append(col_name)

print(f"Dropping {drop_columns}")
data.drop(drop_columns, axis=1, inplace=True)
print(data)

Output of the script is:

  Product   Perc Storage Price
0   Azure  (2.4%  Server  £540
1     AWS    NaN  Server  £640
2     GCP    NaN  Server  £540
Dropping ['Perc']
  Product Storage Price
0   Azure  Server  £540
1     AWS  Server  £640
2     GCP  Server  £540

Process finished with exit code 0

CodePudding user response:

You can try this (I guess the name of the column you want to drop is ""):

import re

change_col = False
for elem in df[""]:
    if re.search(r'[(%]', elem):
        change_col = True

if change_col:
    df = df.drop("", axis=1)
  • Related