How to compare columns contains value not numeric?-CodePudding

Compare COL1 and COL2, then highlight COL1 if its value bigger or equal to COL2. COL1 contains "test" which is not numeric. I need to skip that row and compare rest of them. How to achieve that? Thanks!

CodePudding user response：

If replace non numeric to missing values by to_numeric with errors='coerce' you can compare both columns by Series.ge for greater or equal, original columns are not replaced:

mask = pd.to_numeric(df['COL1'], errors='coerce').ge(df['COL2'])

df['new'] = np.where(mask, 'greater or equal', 'not greater or equal')

If need also different output for non numeric:

df = pd.DataFrame({'COL1':['text', 1,2],
                   'COL2':[1,2,0]})

s = pd.to_numeric(df['COL1'], errors='coerce')
mask1 = s.isna() & df['COL1'].notna()
mask2 = s.ge(df['COL2'])

df['new'] = np.select([mask1,mask2],
                      ['non numeric','greater or equal'],
                       default='not greater or equal')
print (df)
   COL1  COL2                   new
0  text     1           non numeric
1     1     2  not greater or equal
2     2     0      greater or equal

For completeness solution for highlight COL1 is:

df = pd.DataFrame({'COL1':['text', 1,2],
                   'COL2':[1,2,0]})


def color(x): 
   c1 = 'background-color: red'
   c2 = 'background-color: green'
   c3 = 'background-color: yellow'
   c = ''

   s = pd.to_numeric(df['COL1'], errors='coerce')
   m1 = s.isna() & df['COL1'].notna()
   m2 = s.ge(df['COL2'])

   df1 = pd.DataFrame(c, index=x.index, columns=x.columns)
   df1['COL1'] = np.select([m1, m2], [c1, c2], default=c3)
   return df1

df.style.apply(color,axis=None).to_excel('format_file.xlsx', index=False, engine='openpyxl')