Home > Software engineering >  Groupby and sort multiple columns' values raising an AttributeError: 'DataFrameGroupBy
Groupby and sort multiple columns' values raising an AttributeError: 'DataFrameGroupBy

Time:12-26

For the toy dataset below, I'm trying to groupby target_name and sort values by multiple columns: valid_mse, valid_r2_score using: df.groupby('target_name').sort_values(by=['valid_mse', 'valid_r2_score'], ascending=[True, False])

  target_name  train_mse  valid_mse  train_r2_score  valid_r2_score
0         CPI   1.102079   1.842212        0.947458       -0.624665
1         CPI   1.301734   1.890085        0.928005       -0.777463
2         CPI   0.471222   1.078413        0.990599        0.311849
3         PPI   0.113998   0.135523        0.662532        0.262387
4         PPI   0.095434   0.176431        0.752242       -0.422994
5         PPI   0.097648   0.174544        0.744522       -0.203880

But it raises an error: AttributeError: 'DataFrameGroupBy' object has no attribute 'sort_values'. I also try to sort one columns by using df.groupby('target_name').sort_values(by='valid_mse', ascending=True), it raises same error.

Does anyone knows how could I solve this problem correctly? Thanks.

Data in dictionary format:

{'target_name': {0: 'CPI', 1: 'CPI', 2: 'CPI', 3: 'PPI', 4: 'PPI', 5: 'PPI'},
 'train_mse': {0: 1.102079409,
  1: 1.301734392,
  2: 0.471221642,
  3: 0.11399796,
  4: 0.09543417,
  5: 0.097647639},
 'valid_mse': {0: 1.842212034,
  1: 1.890085418,
  2: 1.078413107,
  3: 0.135523283,
  4: 0.176431247,
  5: 0.174543796},
 'train_r2_score': {0: 0.947458162,
  1: 0.928005473,
  2: 0.990599137,
  3: 0.662532128,
  4: 0.752241595,
  5: 0.744522334},
 'valid_r2_score': {0: -0.624665246,
  1: -0.777462993,
  2: 0.311849214,
  3: 0.262387135,
  4: -0.422993602,
  5: -0.203880075}}

Reference link:

How to sort a dataFrame in python pandas by two or more columns?

CodePudding user response:

There is no sort_values ​​in groupBy (object created by groupby).

Wouldn't it be possible to get the desired data by simply sorting in three columns? Something like:

df.sort_values(by=['target_name', 'valid_mse', 'valid_r2_score'],
               ascending=[True, True, False])

This will sort first by target_name column, then by valid_mse and then by valid_r2_score, so it is arguably what you are after:

  target_name  train_mse  valid_mse  train_r2_score  valid_r2_score
2         CPI   0.471222   1.078413        0.990599        0.311849
0         CPI   1.102079   1.842212        0.947458       -0.624665
1         CPI   1.301734   1.890085        0.928005       -0.777463
3         PPI   0.113998   0.135523        0.662532        0.262387
5         PPI   0.097648   0.174544        0.744522       -0.203880
4         PPI   0.095434   0.176431        0.752242       -0.422994
  • Related