Home > Enterprise >  How to fine tune formatting in Pandas
How to fine tune formatting in Pandas

Time:01-18

Just learning some new pandas techniques and working on trying to fine tune the outputs.

Here's my code.

import pandas as pd import numpy as np

dogs = np.random.choice(['labrador', 'poodle', 'pug', 'beagle', 'dachshund'], size=50_000)
smell = np.random.randint(1,100, size=50_000)
df = pd.DataFrame(data= np.array([dogs, smell]).T, columns= ['dog', 'smell']) 

So far so simple.

    dog         smell
0   poodle      83
1   labrador    3
2   poodle      86
3   dachshund   31
4   labrador    16
... ... ...

Then created a one-liner to list the number of each breed using .value_counts.

I normalised using the normalize attribute and then multiplied by 100 to return percentage and then combined .to_frame and .round()

print(f"{(df.value_counts('dog', normalize=True, )*100).to_frame().round(2)}") 

               0
dog             
beagle     20.04
poodle     20.03
labrador   19.98
dachshund  19.98
pug        19.97

It's almost there but is there a simple way to extend the formatting of this one-liner so it looks like - that is that there is a percentage symbol?

               0
dog             
beagle     20.04%
poodle     20.03%
labrador   19.98%
dachshund  19.98%
pug        19.97%

CodePudding user response:

You can use pandas.DataFrame.style.format as pointed out by @Galo do Leste. The following code shows how you could use pandas.DataFrame.style.format to format how dog counts are shown:

import numpy as np
import pandas as pd

# Generate random data for dog breeds and smell levels
dogs = np.random.choice(['labrador', 'poodle', 'pug', 'beagle', 'dachshund'], size=50_000)
smell = np.random.randint(1,100, size=50_000)

# Create a dataframe with the data
df = pd.DataFrame(data= np.array([dogs, smell]).T, columns= ['dog', 'smell'])

# Count the number of occurrences of each dog breed and normalize the values to percentage
df_fmt = df.value_counts('dog', normalize=True).to_frame('% dog breed').reset_index()

# Use the `style.format` method to format the '% dog breed' column to show percentages
df_fmt = df_fmt.style.format({'% dog breed': '{:.2%}'})
df_fmt
#          dog % dog breed
# 0     poodle      20.30%
# 1        pug      20.19%
# 2  dachshund      20.11%
# 3     beagle      20.03%
# 4   labrador      19.37%

Remarks

Using pandas.DataFrame.style.format won't change the underlying values from a column. In other words, if you access df_fmt.data you'll realize that '% dog breed' is still represented as a fraction. To persist the styled dataframe values you can use the following code:

df_persist = pd.read_html(df_fmt.to_html())[0][df_fmt.columns]
df_persist
#          dog % dog breed
# 0     poodle      20.30%
# 1        pug      20.19%
# 2  dachshund      20.11%
# 3     beagle      20.03%
# 4   labrador      19.37%

For more information on formatting dataframes, check out the Pandas documentation on formatting display.

CodePudding user response:

The following one-liners work!

  • Using a lambda function:
print(f"{((df.value_counts('dog', normalize=True, )*100).to_frame().round(2)).iloc[:,0].apply(lambda x: str(x)   '%')}")
  • Change type:
print(f"{((df.value_counts('dog', normalize=True, )*100).to_frame().round(2)).iloc[:,0].astype(str)   '%'}")

Hope this help!

  • Related