Home > Software engineering >  pandas groupby mean sorting by ascending order
pandas groupby mean sorting by ascending order

Time:09-01

I have a large data set : https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-07-19/technology.csv

here is the head of the dataset: head

I have to grouped this dataset by the variables and taken the averages of each country's technology adoption with this:

df.groupby(['variable','iso3c'])[['value']].mean()

here is the output

                           value
variable     iso3c              
BCG          AFG       45.763158
             AGO       56.648649
             ALB       93.875000
             ARE       86.650000
             ARG       93.700000
...                          ...
visitorrooms VNM    46920.636364
             YEM     5527.280000
             ZAF    48431.850000
             ZMB     3518.000000
             ZWE     4696.440000

Now, I want to sort within the variables by largest values to smallest. I thought of doing this:

df.groupby(['variable','iso3c'])[['value']].mean().sort_values(['variable','value'])

but this is the output

                           value
variable     iso3c              
BCG          SWE    1.722500e 01
             SOM    3.812500e 01
             AFG    4.576316e 01
             TCD    4.586111e 01
             ETH    5.141026e 01
...                          ...
visitorrooms ESP    5.755948e 05
             JPN    6.531027e 05
             DEU    7.400641e 05
             ITA    9.286496e 05
             USA    3.040499e 06

[16933 rows x 1 columns]

I have no idea what happens to the values here. How do I fix this?

CodePudding user response:

It looks like you just have a large variance in the values so it's using exp() form.

Option 1: You can chain your sort_values() with a round(x) where x is the number of significant digits you want.

Option 2: Set the pandas precision option to a form you find more comfortable to work with.

CodePudding user response:

import pandas as pd

# import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-07-19/technology.csv')

# Data Pre-Process
df_v2 = df.groupby(['variable','iso3c'])['value'].mean().reset_index()
df_v2.sort_values(['variable','value'],ascending=[True, False]  ,inplace=True)

# Showing Output
df_v2

Hi Brother,

I have attached the code for you, if you have any question please let me know

Thanks Leon

  • Related