Compare COL1 and COL2, then highlight COL1 if its value bigger or equal to COL2. COL1 contains "test" which is not numeric. I need to skip that row and compare rest of them. How to achieve that? Thanks!
CodePudding user response:
If replace non numeric to missing values by to_numeric
with errors='coerce'
you can compare both columns by Series.ge
for greater or equal, original columns are not replaced:
mask = pd.to_numeric(df['COL1'], errors='coerce').ge(df['COL2'])
df['new'] = np.where(mask, 'greater or equal', 'not greater or equal')
If need also different output for non numeric:
df = pd.DataFrame({'COL1':['text', 1,2],
'COL2':[1,2,0]})
s = pd.to_numeric(df['COL1'], errors='coerce')
mask1 = s.isna() & df['COL1'].notna()
mask2 = s.ge(df['COL2'])
df['new'] = np.select([mask1,mask2],
['non numeric','greater or equal'],
default='not greater or equal')
print (df)
COL1 COL2 new
0 text 1 non numeric
1 1 2 not greater or equal
2 2 0 greater or equal
For completeness solution for highlight COL1
is:
df = pd.DataFrame({'COL1':['text', 1,2],
'COL2':[1,2,0]})
def color(x):
c1 = 'background-color: red'
c2 = 'background-color: green'
c3 = 'background-color: yellow'
c = ''
s = pd.to_numeric(df['COL1'], errors='coerce')
m1 = s.isna() & df['COL1'].notna()
m2 = s.ge(df['COL2'])
df1 = pd.DataFrame(c, index=x.index, columns=x.columns)
df1['COL1'] = np.select([m1, m2], [c1, c2], default=c3)
return df1
df.style.apply(color,axis=None).to_excel('format_file.xlsx', index=False, engine='openpyxl')