Python - CSVs - How to delete certain characters only from the header of CSVs?-CodePudding

I have CSVs in a folder, where I want to clean the headers (and only the headers) from special characters, then save the updated CSVs in a new folder.

The issue I'm having is that the special characters are removed from not only the headers, but also from the other rows below.

My code looks like this:

from pathlib import Path
import pandas as pd
import os

parent_dir = input("Enter CSV directory path:")
newdir = "Processed"
directory = os.path.join(parent_dir, newdir)
os.mkdir(directory)
csv_files = [f for f in Path(parent_dir).glob('*.csv')]

for csv in csv_files:
    data = pd.read_csv(csv, encoding = 'ISO-8859-1', engine='python', delimiter = ',')
    data.columns = data.columns.str.replace('[",@]','')
    data.to_csv(parent_dir   "/Processed/"   csv.name, index=False)

Any suggestions on correcting this?

CodePudding user response：

just replace the characters one by one like this

import pandas as pd

# generate sample df
foo = pd.DataFrame(columns=['a@', 'b[', 'c]'])

# select characters to drop
chars_to_drop = ['@', '[', ']']

for char in chars_to_drop:
   foo.columns = foo.columns.str.replace(char, '')

print(foo.columns)
>>> Index(['a', 'b', 'c'], dtype='object')

CodePudding user response：

I've checked and your method should work.

try debugging and show the data in every step see where the special characters are removed.

CodePudding user response：

Can you try the following:

data.columns = data.columns.str.replace('\W', '')

CodePudding user response：

try this

df.columns = df.columns.str.replace(r"[^a-zA-Z\d\_] ", "")

It will remove all characters except letters belonging to english alphabet, spaces and tabs