Home > Blockchain >  persistent column label in pandas dataframe
persistent column label in pandas dataframe

Time:08-15

I have an issue where trying to work with pandas' indexing, this first happened on a larger set and i was able to recreate it in this dummy dataframe. Apologies if my table formatting is terrible, I don't know how to make it better visually.

Unnamed: 0  col1 col2 col3

0   Name    Sun Mon Tue
1   one     1   2   1
2   two     4   4   3
3   three   2   1   1
4   four    1   5   5
5   five    1   5   5
6   six     5   1   1
7   seven   5   5   6
8   eight   5   3   4
9   nine    5   3   3

So what i am trying to do is to rename the 1st column label ('Unnamed: 0') to something meaningful, but then when i finally try to reset_index, the index "column" has the name "test" for some reason, while the first actual column gets the label "index".

df.rename({df.columns[0]: 'test'}, axis=1, inplace=True)
df.set_index('test', inplace=True)
dft = df.transpose()
dft

    test    Name    one two three four five six seven eight nine
    col1    Sun     1   4   2   1   1   5   5   5   5
    col2    Mon     2   4   1   5   5   1   5   3   3
    col3    Tue     1   3   1   5   5   1   6   4   3

First of all, if my understanding is correct, index is not even an actual column in the dataframe, why does it get to have a label when resetting index?

And more importantly, why are the labels "test" and "index" reversed?

dft.reset_index(inplace=True)
dft

test index  Name    one two three four five six seven eight nine
0   col1    Sun     1   4   2   1   1   5   5   5   5
1   col2    Mon     2   4   1   5   5   1   5   3   3
2   col3    Tue     1   3   1   5   5   1   6   4   3

I have tried every possible combination of set_index / reset_index i can think of, trying drop=True & inplace=True but i cannot find a way to create a proper index, like the one i started with.

CodePudding user response:

Yes, the axis (index and column axis) can have names. This is useful for multi-indexing.

When you call .reset_index, the index is extracted into a new column, which is named how your index was named (by default, 'index').

If you want, you can reset and rename index in one line:

df.rename_axis('Name').reset_index()

Why is 'test' printed not where I expect?

After your code, if you print(dft.columns), you will see:

Index(['index', 'Name', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'], 
      dtype='object', 
      name='test')

There are 11 columns. The column axis is called 'test' (see name='test' in the output above).

Also: print(dft.columns.name) prints test.

So what you actually see when you print your dataframe are the column names, to the left of which is the name of the column axis: 'test'.

It is NOT how the index axis is named. You can check: print(type(dft.index.name)) prints <class 'NoneType'>.

Now, why is column axis named 'test'?

Let's see how it got there step by step.

df.rename({df.columns[0]: 'test'}, axis=1, inplace=True)

First column is now named 'test'.

df.set_index('test', inplace=True)

First column has moved from being a column to being an index. The index name is 'test'. The old index disappeared.

dft = df.transpose()

The column axis is now named 'test'. The index is now named however the column axis was named before transposing.

  • Related