Renaming all columns within multiple files-CodePudding

I am trying to rename the column headers for approximately 50 files. Each file has 8 columns. The first contains a row number and then the remaining 7 are those that I'd like to rename. I wrote the function below, but it spits out the following error:

Traceback (most recent call last): File "/Users/someone/Python/pandas/studysession/brand_studies/existing_consumer_status/clean_files.py", line 54, in mapping() File "/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/clean_files.py", line 15, in mapping old_names = df.columns[column_indices] ~~~~~~~~~~^^^^^^^^^^^^^^^^ File "/Users/someone/Python/pandas/studysession/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 5380, in getitem result = getitem(key) ^^^^^^^^^^^^ IndexError: index 1 is out of bounds for axis 0 with size 1

import pandas as pd
import glob
import os
os.chdir('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/')

csv_files = glob.glob('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/*.csv')

def mapping ():
    for csv in csv_files:
        df = pd.read_csv(csv)
        column_indices = [1,2,3,4,5,6,7]
        new_names = ['Awareness','Brand_Recall','Consideration','Purchase_Intent','Ad_Recall','Product_Use','User_Id']
        old_names = df.columns[column_indices]
        final_df = df.rename(columns=dict(zip(old_names, new_names)))
        
        out = csv  '_survey_final.csv'
        final_df.to_csv(out)

mapping()

The expected output is to have the 7 columns renamed using an index to the following names:

'Awareness','Brand_Recall','Consideration','Purchase_Intent','Ad_Recall','Product_Use','User_Id'

I want to use an index because the current/original names of the columns differ slightly.

CodePudding user response：

pandas is an overkill

csv_files = glob.glob('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/*.csv')
newcol="Awareness,Brand_Recall,Consideration,Purchase_Intent,Ad_Recall,Product_Use,User_Id\n"
for csv in  csv_files:
    allines=open(csv).readlines()
    indx=allines[0].split(",")[0]
    allines[0]=indx "," newcol
    with open(csv  '_survey_final.csv',"w") as f:
        f.write(''.join(allines))
###end
###adjust delimiters if they are different

far more efficient would be if you know column header of first column if its was row_no

newcol="Row_no,Awareness,Brand_Recall,Consideration,Purchase_Intent,Ad_Recall,Product_Use,User_Id\n"
for csv in  csv_files:
    data=open(csv).read()
    data=data[data.find("\n"):]
    with open(csv  '_survey_final.csv',"w") as f:
        f.write(newcol data)

CodePudding user response：

The error occurs because the index 1 is out of bounds for the column names in your DataFrame, which has only one row (i.e., axis 0 size is 1). This happens because df.columns is a one-dimensional array of column names and you are trying to access its elements with a list of indices column_indices that exceeds the length of df.columns. To avoid this error, you should specify the new column names directly in the rename method, without using the column_indices and old_names intermediate variables.

Here's a corrected version of your code:

import pandas as pd
import glob
import os
os.chdir('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/')

csv_files = glob.glob('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/*.csv')

def mapping ():
    for csv in csv_files:
        df = pd.read_csv(csv)
        new_names = ['Awareness','Brand_Recall','Consideration','Purchase_Intent','Ad_Recall','Product_Use','User_Id']
        final_df = df.rename(columns={df.columns[1]: new_names[0], df.columns[2]: new_names[1], df.columns[3]: new_names[2], df.columns[4]: new_names[3], df.columns[5]: new_names[4], df.columns[6]: new_names[5], df.columns[7]: new_names[6]})
        
        out = csv  '_survey_final.csv'
        final_df.to_csv(out)

mapping()