Just learning some new pandas techniques and working on trying to fine tune the outputs.
Here's my code.
import pandas as pd import numpy as np
dogs = np.random.choice(['labrador', 'poodle', 'pug', 'beagle', 'dachshund'], size=50_000)
smell = np.random.randint(1,100, size=50_000)
df = pd.DataFrame(data= np.array([dogs, smell]).T, columns= ['dog', 'smell'])
So far so simple.
dog smell
0 poodle 83
1 labrador 3
2 poodle 86
3 dachshund 31
4 labrador 16
... ... ...
Then created a one-liner to list the number of each breed using .value_counts.
I normalised using the normalize attribute and then multiplied by 100 to return percentage and then combined .to_frame and .round()
print(f"{(df.value_counts('dog', normalize=True, )*100).to_frame().round(2)}")
0
dog
beagle 20.04
poodle 20.03
labrador 19.98
dachshund 19.98
pug 19.97
It's almost there but is there a simple way to extend the formatting of this one-liner so it looks like - that is that there is a percentage symbol?
0
dog
beagle 20.04%
poodle 20.03%
labrador 19.98%
dachshund 19.98%
pug 19.97%
CodePudding user response:
You can use pandas.DataFrame.style.format
as pointed out by @Galo do Leste. The following code shows how you could use pandas.DataFrame.style.format
to format how dog counts are shown:
import numpy as np
import pandas as pd
# Generate random data for dog breeds and smell levels
dogs = np.random.choice(['labrador', 'poodle', 'pug', 'beagle', 'dachshund'], size=50_000)
smell = np.random.randint(1,100, size=50_000)
# Create a dataframe with the data
df = pd.DataFrame(data= np.array([dogs, smell]).T, columns= ['dog', 'smell'])
# Count the number of occurrences of each dog breed and normalize the values to percentage
df_fmt = df.value_counts('dog', normalize=True).to_frame('% dog breed').reset_index()
# Use the `style.format` method to format the '% dog breed' column to show percentages
df_fmt = df_fmt.style.format({'% dog breed': '{:.2%}'})
df_fmt
# dog % dog breed
# 0 poodle 20.30%
# 1 pug 20.19%
# 2 dachshund 20.11%
# 3 beagle 20.03%
# 4 labrador 19.37%
Remarks
Using pandas.DataFrame.style.format
won't change the underlying values from a column. In other words, if you access df_fmt.data
you'll realize that '% dog breed'
is still represented as a fraction. To persist the styled dataframe values you can use the following code:
df_persist = pd.read_html(df_fmt.to_html())[0][df_fmt.columns]
df_persist
# dog % dog breed
# 0 poodle 20.30%
# 1 pug 20.19%
# 2 dachshund 20.11%
# 3 beagle 20.03%
# 4 labrador 19.37%
For more information on formatting dataframes, check out the Pandas documentation on formatting display.
CodePudding user response:
The following one-liners work!
- Using a lambda function:
print(f"{((df.value_counts('dog', normalize=True, )*100).to_frame().round(2)).iloc[:,0].apply(lambda x: str(x) '%')}")
- Change type:
print(f"{((df.value_counts('dog', normalize=True, )*100).to_frame().round(2)).iloc[:,0].astype(str) '%'}")
Hope this help!