Home > database >  Regex Match Item in List and Perform Action Based on Match
Regex Match Item in List and Perform Action Based on Match

Time:02-17

For the dataframe:

df1 = pd.DataFrame({'values':['0','123','None','banana','this is a test','-nan']})
df1

I have written the following function to colour a cell value based on its contents:

import pyodbc
import pandas as pd
import datetime
import re

def color_missing_items(value):

  if re.match(r'^0$',value):
    color = 'green'
  elif re.match(r'^[1-9]\d*$',value):
    color = 'red'
  elif any(re.match(null_rep,value) for null_rep in nulls_list):
    color = 'grey'
  
  return 'background-color: %s' % color

The issue I am having is with the 2nd elif.

This is nulls_list:

nulls_list=\
['','#N/A','#N/A N/A','#NA','-1.#IND','-1.#QNAN','-NaN','-nan','1.#IND','None','1.#QNAN','<NA>','N/A','NA','NULL','NaN','n/a','nan','null','Nan','NAN','NULL','na','--','-','__','_']

I want the cell to colour grey only if a value matches a value from nulls_list. Currently any string that doesn't match one of the other conditions still colours the cell grey regardless of whether the value is contained in nulls_list.

df1.style.applymap(color_missing_items)

enter image description here

How can I achieve this?

CodePudding user response:

I would use a different approach than a regex.

Try to convert to number, everything that fails will be grey:

def color_missing_items(s):
    s = pd.to_numeric(s, errors='coerce')
    return 'background-color: '   s.gt(0).mask(s.isna()).map({True: 'green', False: 'red', np.nan: 'grey'})

df1.style.apply(color_missing_items)

output:

colored output

CodePudding user response:

Your code has two problems:

  1. Note that, the way you wrote your code, it will fail if neither of the branches of the if is true (since color will be undefined).

  2. The first regex on your nulls_list is ''. This matches everything. This is why you are not seeing the problem from item (1).

You should declare a default colour value and, since I'm assuming you want to make sure nulls_list matches a whole string, add a regex to match the beginning and the end of the string:

nulls_list = [f'^{reg}$' for reg in nulls_list]   # Fix the regexes to match full string

def color_missing_items(value):

  color = 'blue'    # Ensure color is defined and give it a default value
  if re.match(r'^0$',value):
    color = 'green'
  elif re.match(r'^[1-9]\d*$',value):
    color = 'red'
  elif any(re.match(null_rep,value) for null_rep in nulls_list):
    color = 'grey'
  
  return 'background-color: %s' % color
  • Related