Home > Software engineering >  Is there a way to name only certain columns in Pandas read_csv?
Is there a way to name only certain columns in Pandas read_csv?

Time:03-11

I know it's possible to name columns when using DataFrame.read_csv() in pandas by passing the optional names = ['X', 'Y', 'Z', ...] parameter. However, my question is can you name only the first X columns and the rest get autonamed?

Basically, I have a csv with 23 columns that I want to name, and a further 1023 columns that I need to keep in the DataFrame but don't care about what they're called. Here's an image to illustrate the requirement:

DataFrame showing columns requiring renaming

CodePudding user response:

I don't see a setting in pandas to do this, so I just generated a list of column column names and rename the columns in the DataFrame.

This will work even if you don't know how many columns to expect at the end

Dynamically Rename Columns in Data Frame

import pandas

#Read file
myFile =  pandas.read_csv("C:\\python_work_area\\TestFile.csv",header=None)

#Set known column names
arr_colName = ["MyColName1","MyColName2","MyColName3"]

numOfUnkownCols = len(myFile.columns) - len(arr_colName)
#Generate array of numbers, 1 for each unknown column. Could hard code numOfUnkownCols if column count is known
arr_nums = list(range(1,numOfUnkownCols 1))

#Add numbered unnamed column names to arr_colName
for i in arr_nums:
    arr_colName.append("UnnamedColumn"   str(i))

#Rename column names. inplace = true renames the columns in the existing object, rather than generating a copy 
myFile.set_axis(arr_colName, axis=1, inplace=True)
print (myFile)
  • Related