Home > Back-end >  How to color values in a dataframe based on conditions?
How to color values in a dataframe based on conditions?

Time:06-14

The following code will output the following data frame (it detects changes from one data frame to the other):

import pandas as pd
import numpy as np

a = pd.DataFrame(
    {
        "A": ["1", 2, "3", 4, "5"],
        "B": ["abcd", "efgh", "ijkl", "uhyee", "uhuh"],
        "C": ["jamba", "refresh", "portobello", "performancehigh", "jackalack"],
        "D": ["OQEWINVSKD", "DKVLNQIOEVM", "asdlikvn", "asdkvnddvfvfkdd", np.nan],
    }
)

b = pd.DataFrame(
    {
        "A": ["1", 2, "3", 4, "5", 6],
        "B": ["dah", "fupa", "ijkl", "danju", "uhuh", "freshhhhhhh"],
        "C": [
            "jamba",
            "dimez",
            "pocketfresh",
            "reverbb",
            "jackalack",
            "boombackimmatouchit",
        ],
    }
)


def equalize_length(short, long):
    return pd.concat(
        [
            short,
            pd.DataFrame(
                {
                    col: ["nan"] * (long.shape[0] - short.shape[0])
                    for col in short.columns
                }
            ),
        ]
    ).reset_index(drop=True)


def equalize_width(short, long):
    return pd.concat(
        [
            short,
            pd.DataFrame({col: [] for col in long.columns if col not in short.columns}),
        ],
        axis=1,
    ).reset_index(drop=True)


def equalize(df, other_df):
    if df.shape[0] <= other_df.shape[0]:
        df = equalize_length(df, other_df)
    else:
        other_df = equalize_length(other_df, df)
    if df.shape[1] <= other_df.shape[1]:
        df = equalize_width(df, other_df)
    else:
        other_df = equalize_width(other_df, df)
    df = df.fillna("nan")
    other_df = other_df.fillna("nan")
    return df, other_df

a, b = equalize(a, b)

comparevalues = a.values == b.values

rows, cols = np.where(comparevalues == False)

for item in zip(rows, cols):
    a.iloc[item[0], item[1]] = " {} --> {} ".format(
        a.iloc[item[0], item[1]], b.iloc[item[0], item[1]]
    )
a

enter image description here

I would like to color code based on the conditions in the output. I'd like to implement something similar to the below, but my code does not work:

conditions  = [ 'np.nan -->',
               '--> np.nan', 
               '!np.nan --> !np.nan']

Colors     = [ 'Green', 
               'Red', 
               'Yellow']
    
a = np.select(conditions, Colors)

The error message I get is the following:

enter image description here

Put simply, how can I apply my conditions and colors to the data output? Expected output is a colored a based on the conditions and colors I list above.

CodePudding user response:

You can define a helper function to colorize values depending on given conditions:

def color_differences(val):
    if not isinstance(val, str):
        return "color: blue"
    if "nan --> nan" in val or val == "nan":
        color = "yellow"
    elif "nan -->" in val:
        color = "green"
    elif "--> nan" in val:
        color = "red"
    else:
        color = "blue"
    return f"color: {color}"

And then, at the end of your code, add and run the following cell:

a.style.applymap(color_differences)

enter image description here

  • Related