Home > Back-end >  Pandas MultiIndex Dataframe Styling error when writing to Excel
Pandas MultiIndex Dataframe Styling error when writing to Excel

Time:09-22

I am trying to write a multi-index data frame to excel using pandas styling and I am getting an error.

import pandas as pd
import numpy as np

df=pd.DataFrame(np.random.randn(9,4), pd.MultiIndex.from_product([['A', 'B','C'], ['r1', 'r2','r3']]), columns=[['E1','E1','E2','E2'],['d1','d2','d1','d2']])

def highlight_max(s, props=''):
    return np.where(s == np.nanmax(s.values), props, '')

def highlight_all_by_condition (value, condition, props=''):
    return np.where(value >= condition, props, '')

def highlight_max_value_by_condition(value, condition, props=''):
    return np.where(np.nanmax(value) >= condition, props, '')

df_formatted = df.style.set_properties(**{'font-family': 'Arial','font-size': '10pt'})

unique_column_list = list(set(df.columns.get_level_values(0)))
idx = pd.IndexSlice
for each in unique_column_list:
    slice_=idx[idx[each]]
    df_formatted = df_formatted.apply(highlight_max, props='color:black; font-weight: bold', axis=1, subset=slice_)\
                               .apply(highlight_all_by_condition, condition = 0.55, props='color:red;font-weight: bold; background-color: #ffe6e6', axis=1, subset=slice_)\
                               .apply(highlight_max_value_by_condition, condition = 1, props='color:green;font-weight: bold; background-color: #ffff33', axis=1, subset=slice_)

df_formatted.to_excel("test.xlsx", engine = 'openpyxl')

I am getting the following error:

ValueError: Function <function highlight_max_value_by_condition at 0x000001EE1394E940> returned the wrong shape.
Result has shape: (9,)
Expected shape:   (9, 2)

The second styling function (highlight_max_value_by_condition) is a conditional styling, where it needs to highlight the max value only if it satisfies the condition and if I remove that styling function, then I don't get any error.

Any help is much appreciated. Thanks in advance.

CodePudding user response:

Assuming we're looking for highlight_max_value_by_condition is meant to apply styles to cells which are both the max in the subset and fulfill the condition we can add an & to combine the conditions:

def highlight_max_value_by_condition(value, condition, props=''):
    return np.where(
        (value == np.nanmax(value)) & (value >= condition),
        props,
        ''
    )

Beyond that, however, there are quite a few things we can do to cleanup the general approach.

Styler objects compound naturally, there is no need to assign back. In addition instead of using list(set( to get the level values, styled table


Setup made reproducible with seed(6) and with modified function

import numpy as np
import pandas as pd

np.random.seed(6)
df = pd.DataFrame(
    np.random.randn(9, 4),
    pd.MultiIndex.from_product([['A', 'B', 'C'], ['r1', 'r2', 'r3']]),
    columns=[['E1', 'E1', 'E2', 'E2'], ['d1', 'd2', 'd1', 'd2']]
)


def highlight_max(s, props=''):
    return np.where(s == np.nanmax(s.values), props, '')


def highlight_all_by_condition(value, condition, props=''):
    return np.where(value >= condition, props, '')


def highlight_max_value_by_condition(value, condition, props=''):
    return np.where(
        (value == np.nanmax(value)) & (value >= condition),
        props,
        ''
    )

df:

            E1                  E2          
            d1        d2        d1        d2
A r1 -0.311784  0.729004  0.217821 -0.899092
  r2 -2.486781  0.913252  1.127064 -1.514093
  r3  1.639291 -0.429894  2.631281  0.601822
B r1 -0.335882  1.237738  0.111128  0.129151
  r2  0.076128 -0.155128  0.634225  0.810655
  r3  0.354809  1.812590 -1.356476 -0.463632
C r1  0.824654 -1.176431  1.564490  0.712705
  r2 -0.181007  0.534200 -0.586613 -1.481853
  r3  0.857248  0.943099  0.114441 -0.021957
  • Related