In the following scenario, I would like to:
- Replace the missing values with a custom character, for example
-
- Apply custom formatting with respect to numbers
Problem
I seem to be able to achieve only one of those goals. If I use the code below, I can achieve the desired results with respect to formatting; however the missing values are printed as nan
"""Handling Missing Data in Pandas / Tabulate
"""
import pandas as pd
from tabulate import tabulate
import seaborn as sns
import numpy as np
# Create sample data
iris_data = sns.load_dataset('iris')
# Derive summary table
iris_summary = pd.DataFrame.describe(iris_data, percentiles=[]).transpose()
# Add missing values
iris_summary.iloc[0, 1:6] = None
# Show missing data
print(tabulate(iris_summary, missingval="-",
floatfmt=(".0f", ".0f", ".3f", ".1f", ".4f", ".1f", ".0f")))
Results
------------ --- ------- ----- -------- ----- ---
sepal_length 150 nan nan nan nan nan
sepal_width 150 3.057 0.4 2.0000 3.0 4
petal_length 150 3.758 1.8 1.0000 4.3 7
petal_width 150 1.199 0.8 0.1000 1.3 2
------------ --- ------- ----- -------- ----- ---
Attempt 1
I've tried replacing the missing values
iris_summary.replace(np.nan, "", inplace=True)
but the results are not satisfactory as the number formatting is lost:
------------ --- ------------------ ------------------ --- ---- ---
sepal_length 150
sepal_width 150 3.0573333333333337 0.4358662849366982 2.0 3.0 4.4
petal_length 150 3.7580000000000005 1.7652982332594662 1.0 4.35 6.9
petal_width 150 1.1993333333333336 0.7622376689603465 0.1 1.3 2.5
------------ --- ------------------ ------------------ --- ---- ---
Desired results
I would like to arrive at the table that looks as fallows:
------------ --- ------- ----- -------- ----- ---
sepal_length 150 - - - - -
sepal_width 150 3.057 0.4 2.0000 3.0 4
petal_length 150 3.758 1.8 1.0000 4.3 7
petal_width 150 1.199 0.8 0.1000 1.3 2
------------ --- ------- ----- -------- ----- ---
Notes
- The
tabulate
call is used within a function that outputs LaTeX tables. My intention is to see the solution flexible, sotabulate
can received pandas dataframe with missing values at different locations and correctly apply the formatting passed via thefloatfmt
to the remaining values within the column.
CodePudding user response:
Use replace
:
print(tabulate(iris_summary.replace(np.nan, None), missingval='-',
floatfmt=(".0f", ".0f", ".3f", ".1f", ".4f", ".1f", ".0f")))
Output:
------------ --- ----- --- ------ --- -
sepal_length 150 - - - - -
sepal_width 150 3.057 0.4 2.0000 3.0 4
petal_length 150 3.758 1.8 1.0000 4.3 7
petal_width 150 1.199 0.8 0.1000 1.3 2
------------ --- ----- --- ------ --- -
I think missingval
is for None
value but Pandas convert None
to NaN
where the column has float
dtype so you have to force to replace nan
by None
to get your expected output.