Home > Enterprise >  How to fine tune formatting in Pandas
How to fine tune formatting in Pandas


Just learning some new pandas techniques and working on trying to fine tune the outputs.

Here's my code.

import pandas as pd import numpy as np

dogs = np.random.choice(['labrador', 'poodle', 'pug', 'beagle', 'dachshund'], size=50_000)
smell = np.random.randint(1,100, size=50_000)
df = pd.DataFrame(data= np.array([dogs, smell]).T, columns= ['dog', 'smell']) 

So far so simple.

    dog         smell
0   poodle      83
1   labrador    3
2   poodle      86
3   dachshund   31
4   labrador    16
... ... ...

Then created a one-liner to list the number of each breed using .value_counts.

I normalised using the normalize attribute and then multiplied by 100 to return percentage and then combined .to_frame and .round()

print(f"{(df.value_counts('dog', normalize=True, )*100).to_frame().round(2)}") 

beagle     20.04
poodle     20.03
labrador   19.98
dachshund  19.98
pug        19.97

It's almost there but is there a simple way to extend the formatting of this one-liner so it looks like - that is that there is a percentage symbol?

beagle     20.04%
poodle     20.03%
labrador   19.98%
dachshund  19.98%
pug        19.97%

CodePudding user response:

You can use pandas.DataFrame.style.format as pointed out by @Galo do Leste. The following code shows how you could use pandas.DataFrame.style.format to format how dog counts are shown:

import numpy as np
import pandas as pd

# Generate random data for dog breeds and smell levels
dogs = np.random.choice(['labrador', 'poodle', 'pug', 'beagle', 'dachshund'], size=50_000)
smell = np.random.randint(1,100, size=50_000)

# Create a dataframe with the data
df = pd.DataFrame(data= np.array([dogs, smell]).T, columns= ['dog', 'smell'])

# Count the number of occurrences of each dog breed and normalize the values to percentage
df_fmt = df.value_counts('dog', normalize=True).to_frame('% dog breed').reset_index()

# Use the `style.format` method to format the '% dog breed' column to show percentages
df_fmt = df_fmt.style.format({'% dog breed': '{:.2%}'})
#          dog % dog breed
# 0     poodle      20.30%
# 1        pug      20.19%
# 2  dachshund      20.11%
# 3     beagle      20.03%
# 4   labrador      19.37%


Using pandas.DataFrame.style.format won't change the underlying values from a column. In other words, if you access df_fmt.data you'll realize that '% dog breed' is still represented as a fraction. To persist the styled dataframe values you can use the following code:

df_persist = pd.read_html(df_fmt.to_html())[0][df_fmt.columns]
#          dog % dog breed
# 0     poodle      20.30%
# 1        pug      20.19%
# 2  dachshund      20.11%
# 3     beagle      20.03%
# 4   labrador      19.37%

For more information on formatting dataframes, check out the Pandas documentation on formatting display.

CodePudding user response:

The following one-liners work!

  • Using a lambda function:
print(f"{((df.value_counts('dog', normalize=True, )*100).to_frame().round(2)).iloc[:,0].apply(lambda x: str(x)   '%')}")
  • Change type:
print(f"{((df.value_counts('dog', normalize=True, )*100).to_frame().round(2)).iloc[:,0].astype(str)   '%'}")

Hope this help!

  • Related