Home > Back-end >  Pandas how to use initially generated column names without renaming them
Pandas how to use initially generated column names without renaming them


I was couriose that is any way we can use these initially generated column names by Pandas while reading a csv/Text files like as follows

df = pd.read_csv("some_text_file.txt", header = None)

which will give something like

     0         1         2

0   data1    data2     data3  
1  r2 data1  r2 data2     r2 data3  

When we used header = None it genarated some column names as = 0 1 2 by default.

When I try to acces them like

-->    df['0'] = sometask

It throws error

raise KeyError(key) from err
KeyError: '0'

Aren't they column names at all?. I've seen some people calling them as Levels. Like

level0 - column 0
level1 - column 1
level2 - column 2 

I've also tried

-->    df[level0] = sometask

it throwed

NameError: name 'level0' is not definedNameError: name 'level0' is not defined

I know we have to rename the column names and use them like

df.columns =['col1','col2'.....]

But, Wondering there is any way we can these pandas genarated column names without renamaing them as shown above.

CodePudding user response:

Inside pd.read_csv, you can pass a list to the names parameter. E.g.:

df = pd.read_csv('some_text_file.txt', header=None, 
                 names=[f'col_{i}' for i in range(1,4)])


      col_1     col_2     col_3
0     data1     data2     data3
1  r2 data1  r2 data2  r2 data3

Note that the list of names cannot contain any duplicates (e.g. ['col', 'col', 'col2'] will cause an error).

The default col "names" 0,1,2 etc. are integers, rather than strings. You can check this as follows:


Int64Index([0, 1, 2], dtype='int64')

E.g. to access column 0, you should use df[0] or df.loc[:,0], not df['0'] etc.

CodePudding user response:

The name of the columns is, by default, a number. Hence, when trying to access df['0'], you get a KeyError, but if you use df[0], you will get the first column.

  • Related