I have CSVs in a folder, where I want to clean the headers (and only the headers) from special characters, then save the updated CSVs in a new folder.
The issue I'm having is that the special characters are removed from not only the headers, but also from the other rows below.
My code looks like this:
from pathlib import Path
import pandas as pd
import os
parent_dir = input("Enter CSV directory path:")
newdir = "Processed"
directory = os.path.join(parent_dir, newdir)
os.mkdir(directory)
csv_files = [f for f in Path(parent_dir).glob('*.csv')]
for csv in csv_files:
data = pd.read_csv(csv, encoding = 'ISO-8859-1', engine='python', delimiter = ',')
data.columns = data.columns.str.replace('[",@]','')
data.to_csv(parent_dir "/Processed/" csv.name, index=False)
Any suggestions on correcting this?
CodePudding user response:
just replace the characters one by one like this
import pandas as pd
# generate sample df
foo = pd.DataFrame(columns=['a@', 'b[', 'c]'])
# select characters to drop
chars_to_drop = ['@', '[', ']']
for char in chars_to_drop:
foo.columns = foo.columns.str.replace(char, '')
print(foo.columns)
>>> Index(['a', 'b', 'c'], dtype='object')
CodePudding user response:
I've checked and your method should work.
try debugging and show the data in every step see where the special characters are removed.
CodePudding user response:
Can you try the following:
data.columns = data.columns.str.replace('\W', '')
CodePudding user response:
try this
df.columns = df.columns.str.replace(r"[^a-zA-Z\d\_] ", "")
It will remove all characters except letters belonging to english alphabet, spaces and tabs