Home > Mobile >  Properly handling missing values and formatting for pandas data frame printed to tabulate
Properly handling missing values and formatting for pandas data frame printed to tabulate

Time:02-18

In the following scenario, I would like to:

  • Replace the missing values with a custom character, for example -
  • Apply custom formatting with respect to numbers

Problem

I seem to be able to achieve only one of those goals. If I use the code below, I can achieve the desired results with respect to formatting; however the missing values are printed as nan

"""Handling Missing Data in Pandas / Tabulate
"""

import pandas as pd
from tabulate import tabulate
import seaborn as sns
import numpy as np

# Create sample data
iris_data = sns.load_dataset('iris')
# Derive summary table
iris_summary = pd.DataFrame.describe(iris_data, percentiles=[]).transpose()
# Add missing values
iris_summary.iloc[0, 1:6] = None


# Show missing data
print(tabulate(iris_summary, missingval="-",
               floatfmt=(".0f", ".0f", ".3f", ".1f", ".4f", ".1f", ".0f")))

Results

------------  ---  -------  -----  --------  -----  ---
sepal_length  150  nan      nan    nan       nan    nan
sepal_width   150    3.057    0.4    2.0000    3.0    4
petal_length  150    3.758    1.8    1.0000    4.3    7
petal_width   150    1.199    0.8    0.1000    1.3    2
------------  ---  -------  -----  --------  -----  ---

Attempt 1

I've tried replacing the missing values

iris_summary.replace(np.nan, "", inplace=True)

but the results are not satisfactory as the number formatting is lost:

------------  ---  ------------------  ------------------  ---  ----  ---
sepal_length  150
sepal_width   150  3.0573333333333337  0.4358662849366982  2.0  3.0   4.4
petal_length  150  3.7580000000000005  1.7652982332594662  1.0  4.35  6.9
petal_width   150  1.1993333333333336  0.7622376689603465  0.1  1.3   2.5
------------  ---  ------------------  ------------------  ---  ----  ---

Desired results

I would like to arrive at the table that looks as fallows:

------------  ---  -------  -----  --------  -----  ---
sepal_length  150    -        -      -         -      -
sepal_width   150    3.057    0.4    2.0000    3.0    4
petal_length  150    3.758    1.8    1.0000    4.3    7
petal_width   150    1.199    0.8    0.1000    1.3    2
------------  ---  -------  -----  --------  -----  ---

Notes

  • The tabulate call is used within a function that outputs LaTeX tables. My intention is to see the solution flexible, so tabulate can received pandas with missing values at different locations and correctly apply the formatting passed via the floatfmt to the remaining values within the column.

CodePudding user response:

Use replace:

print(tabulate(iris_summary.replace(np.nan, None), missingval='-',
               floatfmt=(".0f", ".0f", ".3f", ".1f", ".4f", ".1f", ".0f")))

Output:

------------  ---  -----  ---  ------  ---  -
sepal_length  150  -      -    -       -    -
sepal_width   150  3.057  0.4  2.0000  3.0  4
petal_length  150  3.758  1.8  1.0000  4.3  7
petal_width   150  1.199  0.8  0.1000  1.3  2
------------  ---  -----  ---  ------  ---  -

I think missingval is for None value but Pandas convert None to NaN where the column has float dtype so you have to force to replace nan by None to get your expected output.

  • Related