I am trying to rename the column headers for approximately 50 files. Each file has 8 columns. The first contains a row number and then the remaining 7 are those that I'd like to rename. I wrote the function below, but it spits out the following error:
Traceback (most recent call last): File "/Users/someone/Python/pandas/studysession/brand_studies/existing_consumer_status/clean_files.py", line 54, in mapping() File "/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/clean_files.py", line 15, in mapping old_names = df.columns[column_indices] ~~~~~~~~~~^^^^^^^^^^^^^^^^ File "/Users/someone/Python/pandas/studysession/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 5380, in getitem result = getitem(key) ^^^^^^^^^^^^ IndexError: index 1 is out of bounds for axis 0 with size 1
import pandas as pd
import glob
import os
os.chdir('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/')
csv_files = glob.glob('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/*.csv')
def mapping ():
for csv in csv_files:
df = pd.read_csv(csv)
column_indices = [1,2,3,4,5,6,7]
new_names = ['Awareness','Brand_Recall','Consideration','Purchase_Intent','Ad_Recall','Product_Use','User_Id']
old_names = df.columns[column_indices]
final_df = df.rename(columns=dict(zip(old_names, new_names)))
out = csv '_survey_final.csv'
final_df.to_csv(out)
mapping()
The expected output is to have the 7 columns renamed using an index to the following names:
'Awareness','Brand_Recall','Consideration','Purchase_Intent','Ad_Recall','Product_Use','User_Id'
I want to use an index because the current/original names of the columns differ slightly.
CodePudding user response:
pandas is an overkill
csv_files = glob.glob('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/*.csv')
newcol="Awareness,Brand_Recall,Consideration,Purchase_Intent,Ad_Recall,Product_Use,User_Id\n"
for csv in csv_files:
allines=open(csv).readlines()
indx=allines[0].split(",")[0]
allines[0]=indx "," newcol
with open(csv '_survey_final.csv',"w") as f:
f.write(''.join(allines))
###end
###adjust delimiters if they are different
far more efficient would be if you know column header of first column if its was row_no
newcol="Row_no,Awareness,Brand_Recall,Consideration,Purchase_Intent,Ad_Recall,Product_Use,User_Id\n"
for csv in csv_files:
data=open(csv).read()
data=data[data.find("\n"):]
with open(csv '_survey_final.csv',"w") as f:
f.write(newcol data)
CodePudding user response:
The error occurs because the index 1 is out of bounds for the column names in your DataFrame, which has only one row (i.e., axis 0 size is 1). This happens because df.columns is a one-dimensional array of column names and you are trying to access its elements with a list of indices column_indices that exceeds the length of df.columns. To avoid this error, you should specify the new column names directly in the rename method, without using the column_indices and old_names intermediate variables.
Here's a corrected version of your code:
import pandas as pd
import glob
import os
os.chdir('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/')
csv_files = glob.glob('/Users/someone/Python/pandas/studysession/studies/existing_consumer_status/survey_data/*.csv')
def mapping ():
for csv in csv_files:
df = pd.read_csv(csv)
new_names = ['Awareness','Brand_Recall','Consideration','Purchase_Intent','Ad_Recall','Product_Use','User_Id']
final_df = df.rename(columns={df.columns[1]: new_names[0], df.columns[2]: new_names[1], df.columns[3]: new_names[2], df.columns[4]: new_names[3], df.columns[5]: new_names[4], df.columns[6]: new_names[5], df.columns[7]: new_names[6]})
out = csv '_survey_final.csv'
final_df.to_csv(out)
mapping()