Home > Back-end >  Pandas: Sort by sum of 2 columns
Pandas: Sort by sum of 2 columns

Time:04-07

I have a DataFrame:

COL1 COL2
   1    1
   3    1
   1    3

I need to sort by COL1 COL2.

key=lambda col: f(col) argument-function of sort_values(...) lets you sort by a changed column but in the described case I need to sort on the basis of 2 columns. So, it would be nice if there were an opportunity to provide a key argument-function for 2 or more columns but I don't know whether such a one exists.

So, how can I sort its rows by sum COL1 COL2?

Thank you for your time!

CodePudding user response:

This does the trick:

data = {"Column 1": [1, 3, 1], "Column 2": [1, 2, 3]}
df = pd.DataFrame(data)

sorted_indices = (df["Column 1"]   df["Column 2"]).sort_values().index

df.loc[sorted_indices, :]

I just created a series that has the sum of both the columns, sorted it, got the sorted indices, and printed those indices out for the dataframe.

(I changed the data a little so you can see the sorting in action. Using the data you provided, you wouldn't have been able to see the sorted data as it would have been the same as the original one.)

CodePudding user response:

Assuming a unique index, you can also conveniently use the key parameter of sort_values to pass a callable to apply to the by column. Here we can add the other column:

df.sort_values(by='COL1', key=df['COL2'].add)

We can even generalize to any number of columns using sort_index:

df.sort_index(key=df.sum(1).get)

Output:

   COL1  COL2
0     1     1
2     1     3
1     3     2

Used input:

data = {"COL1": [1, 3, 1], "COL2": [1, 2, 3]}
df = pd.DataFrame(data)
  • Related