I am new to Pandas and learning. I am reading excel to DataFrame and comparing columns and highlight the column that's not same. For example if Column A is not same as Column B then highlight the Column B. However I have some null values in Column A and Column B. When I execute the code, I don't want to highlight the null values in Column B. How can I do that? Below is my code:
file = Path(path to excel)
df = pd.read_excel(file)
def color(x):
c1 = 'background-color: red'
m1 = x['AMOUNT A'] != x['AMOUNT B']
m2 = x['AMOUNT C'] != x['AMOUNT D']
df = pd.DataFrame('',index=x.index, columns=x.columns)
df['AMOUNT B'] = np.select([m1],[c1], default='')
df['AMOUNT D'] = np.select([m2],[c1], default='')
return df
writer = pd.ExcelWriter(path to excel)
df.style.apply(color,axis=None).to_excel(writer, 'data', index=False)
df before color function:
AMOUNT A AMOUNT B AMOUNT C AMOUNT D
0 100.0 200.0 NaN NaN
1 200.0 200.0 45.0 25.0
2 100.0 500.0 NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 200.0 1.0 NaN NaN
Output after running script:
Expected output:
CodePudding user response:
The issue here is NaN == NaN
will return False
def color(x):
c1 = 'background-color: red'
m1 = x['AMOUNT A'].fillna('') != x['AMOUNT B'].fillna('')
m2 = x['AMOUNT C'].fillna('') != x['AMOUNT D'].fillna('')
df = pd.DataFrame('',index=x.index, columns=x.columns)
df['AMOUNT B'] = np.select([m1],[c1], default='')
df['AMOUNT D'] = np.select([m2],[c1], default='')
return df
More Info
np.nan == np.nan
Out[527]: False