I have a dataframe:
Product | Storage | Price | |
---|---|---|---|
Azure | (2.4% | Server | £540 |
AWS | Server | £640 | |
GCP | Server | £540 |
I would like to remove the column which contains the string '(2.4%' however I only want to remove the column in Pandas through regex if regex finds either a bracket or percentage in the string in that column '(%' and then pandas should drop that column entirely.
Please can you help me find a way to use regex to search for special characters within a string and drop the column if that condition is met?
I've searched on stack/google. I've used the following so far:
df = df.drop([col for col in df.columns if df[col].eq('(%').any()], axis=1)
chars = '(%'
regex = f'[{"".join(map(re.escape, chars))}]'
df = df.loc[:, ~df.apply(lambda c: c.str.contains(regex).any())]
however neither of these worked.
Any help would be greatly appreciated. :)
Thank You * Insert Smiley*
CodePudding user response:
you re using eq
function it check exactly if the value in the columun
match %
instead of eq
do this
df.drop([col for col in df.columns if df[col].apply(lambda x:'(%' in str(x)).any()], axis=1,inplace=True)
CodePudding user response:
I would do something like this
import pandas as pd
from io import StringIO
text = """
Product,Perc,Storage,Price
Azure,(2.4%,Server,£540
AWS,,Server,£640
GCP,,Server,£540
"""
data = pd.read_csv(StringIO(text))
print(data)
drop_columns = list()
for col_name in data.columns:
has_special_characters = data[col_name].str.contains("[\(%]")
if has_special_characters.any():
drop_columns.append(col_name)
print(f"Dropping {drop_columns}")
data.drop(drop_columns, axis=1, inplace=True)
print(data)
Output of the script is:
Product Perc Storage Price
0 Azure (2.4% Server £540
1 AWS NaN Server £640
2 GCP NaN Server £540
Dropping ['Perc']
Product Storage Price
0 Azure Server £540
1 AWS Server £640
2 GCP Server £540
Process finished with exit code 0
CodePudding user response:
You can try this (I guess the name of the column you want to drop is ""):
import re
change_col = False
for elem in df[""]:
if re.search(r'[(%]', elem):
change_col = True
if change_col:
df = df.drop("", axis=1)