Home > Software design >  Chaining Pandas DataFrame Styles
Chaining Pandas DataFrame Styles

Time:05-15

*edited DataFrame random generator

I have 2 dfs, one used as a mask for the other.

rndm = pd.DataFrame(np.random.randint(0,15,size=(100, 4)), columns=list('ABCD'))
rndm_mask = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD'))

I want to use 2 conditions to change the values in rndm:

  1. Is the value the mode of the column?
  2. rndm_mask == 1

What works so far:

def colorBoolean(val):

    return f'background-color: {"red" if val else ""}'

rndm.style.apply(lambda _: rndm_mask.applymap(colorBoolean), axis=None)


# helper function to find Mode
def highlightMode(s):
    # Get mode of columns
    mode_ = s.mode().values
    # Apply style if the current value is in mode_ array (len==1)
    return ['background-color: yellow' if v in mode_ else '' for v in s]

Issue:

I'm unsure how to chain both functions in a way that values in rndm are highlighted only if they match both criteria (ie. value must be the most frequent value in column as well as be True in rndm_mask).

I appreciate any advice! Thanks

CodePudding user response:

Try this, since your df_bool dataframe is a mask (identically indexed) then you can referred to the df_bool object inside the style function, where x.name is the name of the column passed in via df.apply:

df = pd.DataFrame({'A':[5.5, 3, 0, 3, 1],
                     'B':[2, 1, 0.2, 4, 5],
                     'C':[3, 1, 3.5, 6, 0]})

df_bool = pd.DataFrame({'A':[0, 1, 0, 0, 1],
                          'B':[0, 0, 1, 0, 0],
                          'C':[1, 1, 1, 0, 0]})
# I want to use 2 conditions to change the values in df:

# Is the value the mode of the column?
# df_bool == 1
# What works so far:
def colorBoolean(x):
    return [f'background-color: red' if v else '' for v in df_bool[x.name]]

# helper function to find Mode
def highlightMode(s):
    # Get mode of columns
    mode_ = s.mode().values
    # Apply style if the current value is in mode_ array (len==1)
    return ['background-color: yellow' if v in mode_ else '' for v in s]

df.style.apply(colorBoolean).apply(highlightMode) 

Output:

enter image description here

Or the other way:

df.style.apply(highlightMode).apply(colorBoolean)

Output:

enter image description here

Update

Highlight where both are true:

def highlightMode(s):
    # Get mode of columns
    mode_ = s.mode().values
    # Apply style if the current value is in mode_ array (len==1)
    return ['background-color: yellow' if (v in mode_) & b else '' for v, b in zip(s, df_bool[s.name])]

df.style.apply(highlightMode)

Output:

enter image description here

  • Related