Home > Software engineering >  how to sort dataframe by sum of columns without creating additional column
how to sort dataframe by sum of columns without creating additional column

Time:06-26

grouped1=exp.groupby("country")["value"].sum().reset_index()
grouped2=imp.groupby("country")["value"].sum().reset_index()
grouped=grouped1.merge(grouped2,on="country")
grouped.rename(columns={"value_x":"export_to","value_y":"import_from"},inplace=True)
grouped

output:

enter image description here

I want to sort dataframe by the sum of export_to and import_from

I tried this:

grouped.sort_values(grouped.export_to grouped.import_from)

CodePudding user response:

In my opinion, there's nothing wrong to add a new column to sort by. But of course, we can replace the index with the series of sums as an option:

(
    grouped
    .set_index(grouped.export_to   grouped.import_from)
    .sort_index()
    .reset_index(drop=True)
)

We can hide all this machinery deeper inside sort_index:

grouped.sort_index(
    ignore_index=True, 
    key=lambda index: grouped.loc[index, ['export_to','import_from']].sum('columns')
)

The process is the same: an index has been replaced here by a key-function and then sorted. After that reordered index is dropped due to ignore_index=True.

CodePudding user response:

You can create a temporary column for total and drop it after sorting by it.

df.assign(total=df['export_to'] df['import_from']).sort_values('total').drop(columns='total')
    country     export_to   import_from
3   ANDORA         6.28         5.82
1   ALBANIA      196.51       524.18
0   AFG         4790.19      2682.23
2   ALGERIA     8232.24     10185.12

Even if you don't drop the total_column, it will not be added to the DF permanently, and the result will still be the same. though shows in the result

    country     export_to   import_from      total
3   ANDORA         6.28         5.82         12.10
1   ALBANIA      196.51       524.18        720.69
0   AFG         4790.19      2682.23       7472.42
2   ALGERIA     8232.24     10185.12      18417.36
df.assign(total=df['export_to'] df['import_from']).sort_values('total')
  • Related