I'm trying to sort a dataframe grouped by a column but sorting by the sum of each group:
I have a dataframe with the columns ["make", "model", "year", "fuel", "variant", "num registered cars"] (cars info). My goal is to sort for each year every make, model by the sum of the number of registered cars but grouping by 'year', 'make', 'model', 'fuel'. Here is the code I already implemented:
sorted_table = simix_table.groupby(["make", "model", "year"])['matriculas'].sum().reset_index(name="matriculas").sort_values(["year", "matriculas"], ascending=[True, False]).reset_index(drop=True)
How can I sort my original dataframe using the order in my new sorted_table grouping by 'year', 'make', 'model', 'fuel'?
the output has to be as folows:
This example I used the grupby operation to ilustrate an example. In this case Opel Corsa comes first because the sum of opels corsa in the 2013 (gasolina diesel Gas Licuado..) is the biggest this year.
Example:
cars_data = [[2013, 'Corsa', 'Opel', 'Gas', 'variant corsa', 'version corsa', 6282],
[2014, 'Corsa', 'Opel', 'Gas', 'variant corsa', 'version corsa', 2210],
[2013, 'Corsa', 'Opel', 'Diesel', 'variant corsa 2', 'version corsa 2', 1432],
[2014, 'Corsa', 'Opel', 'Diesel', 'variant corsa 2', 'version corsa 2', 1432],
[2013, 'Polo', 'Volkswagen', 'Gas', 'variant polo', 'version polo', 5316],
[2014, 'Polo', 'Volkswagen', 'Gas', 'variant polo', 'version polo', 5224],
[2013, 'Polo', 'Volkswagen', 'Electric', 'variant polo 2', 'version polo 2', 3126],
[2014, 'Polo', 'Volkswagen', 'Electric', 'variant polo 2', 'version polo 2', 1513],
[2013, 'Fiesta', 'Ford', 'Gas', 'variant fiesta', 'version fiesta', 1351],
[2014, 'Fiesta', 'Ford', 'Gas', 'variant fiesta', 'version fiesta', 4351],
[2013, 'Fiesta', 'Ford', 'Diesel', 'variant fiesta 2', 'version fiesta 2', 4523],
[2014, 'Fiesta', 'Ford', 'Diesel', 'variant fiesta 2', 'version fiesta 2', 3523]]
columns = ['year', 'model', 'make', 'fuel', 'variant', 'version', 'num_registered_cars']
cars_df = pd.DataFrame(cars_data, columns=columns)
Output of the group by operation:
cars_df.groupby(["make", "model", "year"])['num_registered_cars'].sum().reset_index(name="num_registered_cars").sort_values(["year", "num_registered_cars"], ascending=[True, False]).reset_index(drop=True)
So the final result of the sorted dataframe has to be a dataframe with this rows:
[2013, 'Polo', 'Volkswagen', 'Gas', 'variant polo', 'version polo', 5316],
[2013, 'Polo', 'Volkswagen', 'Electric', 'variant polo 2', 'version polo 2', 3126],
[2013, 'Corsa', 'Opel', 'Gas', 'variant corsa', 'version corsa', 6282],
[2013, 'Corsa', 'Opel', 'Diesel', 'variant corsa 2', 'version corsa 2', 1432],
[2013, 'Fiesta', 'Ford', 'Diesel', 'variant fiesta 2', 'version fiesta 2', 4523],
[2013, 'Fiesta', 'Ford', 'Gas', 'variant fiesta', 'version fiesta', 1351],
[2014, 'Fiesta', 'Ford', 'Gas', 'variant fiesta', 'version fiesta', 4351],
[2014, 'Fiesta', 'Ford', 'Diesel', 'variant fiesta 2', 'version fiesta 2', 3523],
[2014, 'Polo', 'Volkswagen', 'Gas', 'variant polo', 'version polo', 5224],
[2014, 'Polo', 'Volkswagen', 'Electric', 'variant polo 2', 'version polo 2', 1513]
[2014, 'Corsa', 'Opel', 'Gas', 'variant corsa', 'version corsa', 2210],
[2014, 'Corsa', 'Opel', 'Diesel', 'variant corsa 2', 'version corsa 2', 1432]
The 2013 starts with the Volkswagen Polo because 5316 3126 = 8442 greater than opel corsa 6282 1432 = 7714 ...
The 2014 starts with the Ford Fiesta because 4351 3523 = 7874 greater than Volkswagen Polo 5224 1513 = 6737 ...
How can I do this type of sorting?
CodePudding user response:
You can use groupby.transform
to add a temporary column, sort, then drop:
out = (cars_df
.assign(total=cars_df.groupby(['make', 'model', 'year'])
['num_registered_cars'].transform('sum'))
.sort_values(by=['year', 'total'], ascending=[True, False])
.drop(columns='total')
)
output:
year model make fuel variant version num_registered_cars
4 2013 Polo Volkswagen Gas variant polo version polo 5316
6 2013 Polo Volkswagen Electric variant polo 2 version polo 2 3126
0 2013 Corsa Opel Gas variant corsa version corsa 6282
2 2013 Corsa Opel Diesel variant corsa 2 version corsa 2 1432
8 2013 Fiesta Ford Gas variant fiesta version fiesta 1351
10 2013 Fiesta Ford Diesel variant fiesta 2 version fiesta 2 4523
9 2014 Fiesta Ford Gas variant fiesta version fiesta 4351
11 2014 Fiesta Ford Diesel variant fiesta 2 version fiesta 2 3523
5 2014 Polo Volkswagen Gas variant polo version polo 5224
7 2014 Polo Volkswagen Electric variant polo 2 version polo 2 1513
1 2014 Corsa Opel Gas variant corsa version corsa 2210
3 2014 Corsa Opel Diesel variant corsa 2 version corsa 2 1432