I was couriose that is any way we can use these initially generated column names
by Pandas
while reading a csv/Text
files like as follows
df = pd.read_csv("some_text_file.txt", header = None)
which will give something like
0 1 2
0 data1 data2 data3
1 r2 data1 r2 data2 r2 data3
When we used header = None
it genarated some column names as = 0 1 2
by default.
When I try to acces them like
--> df['0'] = sometask
It throws error
raise KeyError(key) from err
KeyError: '0'
Aren't they column names at all?. I've seen some people calling them as Levels
. Like
level0 - column 0
level1 - column 1
level2 - column 2
I've also tried
--> df[level0] = sometask
it throwed
NameError: name 'level0' is not definedNameError: name 'level0' is not defined
I know we have to rename the column names and use them like
df.columns =['col1','col2'.....]
But, Wondering there is any way we can these pandas
genarated column names
without renamaing them as shown above.
CodePudding user response:
Inside pd.read_csv
, you can pass a list to the names
parameter. E.g.:
df = pd.read_csv('some_text_file.txt', header=None,
names=[f'col_{i}' for i in range(1,4)])
print(df)
col_1 col_2 col_3
0 data1 data2 data3
1 r2 data1 r2 data2 r2 data3
Note that the list of names
cannot contain any duplicates (e.g. ['col', 'col', 'col2']
will cause an error).
The default col "names" 0,1,2
etc. are integers, rather than strings. You can check this as follows:
print(df.columns)
Int64Index([0, 1, 2], dtype='int64')
E.g. to access column 0
, you should use df[0]
or df.loc[:,0]
, not df['0']
etc.
CodePudding user response:
The name of the columns is, by default, a number. Hence, when trying to access df['0']
, you get a KeyError
, but if you use df[0]
, you will get the first column.