Home > other >  Sorting pandas dataframe by column index instead of column name
Sorting pandas dataframe by column index instead of column name

Time:01-18

Given this sample dataframe named df:

df = pd.DataFrame({'name': ['Mary', 'Joe', 'Jessie'], 'score': [10, 3, 13]})


name     score
Mary     10
Joe      3
Jessie   13

Now trying to sort the dataframe instead of by column, by the column via its location:

Typical way to sort dataframe:

df = df.sort_values['score']

Trying to sort it like this (which is not working):

df = df.sort_values[df.iloc[:, 1]]

This raises an otherwise unintelligible "Key Error" with no explanation what it is referring to.

I need to do this because the function containing this code will have a different name for the second column each time it runs so I cannot hard code a column name for sorting and instead need to sort by whatever the second column is, no matter its name.

Thanks for taking a moment to check this out.

CodePudding user response:

sort_values is not an indexer but a method. You use it with [] instead of () but it doesn't seem the problem.

If you want to sort your dataframe by the second column whatever the name, use:

>>> df.sort_values(df.columns[1])
     name  score
1     Joe      3
0    Mary     10
2  Jessie     13

CodePudding user response:

One way could be to set_index by the desired column, sort_index and change the index back to original:

df = df.set_index(df.iloc[:,1]).sort_index().reset_index(drop=True)

As @Neither suggests, we could ignore_index when using sort_index to skip resetting the index:

df = df.set_index(df.iloc[:,1]).sort_index(ignore_index=True)

Output:

     name  score
0     Joe      3
1    Mary     10
2  Jessie     13
  •  Tags:  
  • Related