Home > database >  KeyError after renaming the first n column names of python pandas dataframe
KeyError after renaming the first n column names of python pandas dataframe

Time:06-28

We have a simple input file as in the picture. We load the csv input into pandas dataframe, and we want to rename the first n-th columns, in this example, the first three columns.

enter image description here

The code

import pandas as pd
file_path = r"C:\Codes\test\test_data.csv"
df1 = pd.read_csv(file_path)
print (df1, "\n", type(df1), "\n", df1.columns, "\n", type(df1.columns), "\n", df1.columns.values, "\n", type(df1.columns.values))

df2 = df1.copy()
print (df2, "\n", type(df2), "\n", df2.columns, "\n", type(df2.columns), "\n", df2.columns.values, "\n", type(df2.columns.values))

df2.columns.values[0:3] = ["symbol","field","abc"]  
print("\n after renaming the columns: ", df2)
print(df2["symbol"])

The result is as follows:

enter image description here

It seems that the code df2.columns.values[0:3] = ["symbol","field","abc"] is not stable. Sometimes it works and sometimes it does not, for example, it may report an "KeyError" or the code freeze when trying to display df2. I do understand why it does not work. I am working on Windows 10, with Python 3.10.4

Of course, I can also write the following code, which works

df2.rename(columns={df2.columns[0]:  "symbol"},inplace=True)
df2.rename(columns={df2.columns[1]:  "field"},inplace=True)
df2.rename(columns={df2.columns[2]:  "abc"},inplace=True)

But my goal is to change the first n-th columns in a simple code.

CodePudding user response:

You can try updating the column labels like this:

df2 = df2.rename(columns=dict(zip(list(df2.columns)[0:3], ["symbol","field","abc"])))

... or like this:

df2.columns = ["symbol","field","abc"]   list(df2.columns)[3:]

Output:

   COLA  COL_B  testC
0     1      2      3
1    10     11     12
 <class 'pandas.core.frame.DataFrame'>
 Index(['COLA', 'COL_B', 'testC'], dtype='object')
 <class 'pandas.core.indexes.base.Index'>
 ['COLA' 'COL_B' 'testC']
 <class 'numpy.ndarray'>
   COLA  COL_B  testC
0     1      2      3
1    10     11     12
 <class 'pandas.core.frame.DataFrame'>
 Index(['COLA', 'COL_B', 'testC'], dtype='object')
 <class 'pandas.core.indexes.base.Index'>
 ['COLA' 'COL_B' 'testC']
 <class 'numpy.ndarray'>

 after renaming the columns:
   symbol  field  abc
0       1      2    3
1      10     11   12
df2["symbol"]
0     1
1    10
Name: symbol, dtype: int64

Note that the docs for Index.values have a warning which reads:

We recommend using Index.array or Index.to_numpy(), depending on whether you need a reference to the underlying data or a NumPy array.

  • Related