I have a large data set : https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-07-19/technology.csv
here is the head of the dataset: head
I have to grouped this dataset by the variables and taken the averages of each country's technology adoption with this:
df.groupby(['variable','iso3c'])[['value']].mean()
here is the output
value
variable iso3c
BCG AFG 45.763158
AGO 56.648649
ALB 93.875000
ARE 86.650000
ARG 93.700000
... ...
visitorrooms VNM 46920.636364
YEM 5527.280000
ZAF 48431.850000
ZMB 3518.000000
ZWE 4696.440000
Now, I want to sort within the variables by largest values to smallest. I thought of doing this:
df.groupby(['variable','iso3c'])[['value']].mean().sort_values(['variable','value'])
but this is the output
value
variable iso3c
BCG SWE 1.722500e 01
SOM 3.812500e 01
AFG 4.576316e 01
TCD 4.586111e 01
ETH 5.141026e 01
... ...
visitorrooms ESP 5.755948e 05
JPN 6.531027e 05
DEU 7.400641e 05
ITA 9.286496e 05
USA 3.040499e 06
[16933 rows x 1 columns]
I have no idea what happens to the values here. How do I fix this?
CodePudding user response:
It looks like you just have a large variance in the values so it's using exp() form.
Option 1: You can chain your sort_values() with a round(x) where x is the number of significant digits you want.
Option 2: Set the pandas precision option to a form you find more comfortable to work with.
CodePudding user response:
import pandas as pd
# import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-07-19/technology.csv')
# Data Pre-Process
df_v2 = df.groupby(['variable','iso3c'])['value'].mean().reset_index()
df_v2.sort_values(['variable','value'],ascending=[True, False] ,inplace=True)
# Showing Output
df_v2
Hi Brother,
I have attached the code for you, if you have any question please let me know
Thanks Leon