Home > Back-end >  Find matching column interval in pandas
Find matching column interval in pandas

Time:10-12

I have a pandas dataframe with multiple columns were their values increase from some value between 0 and 1 for column A up to column E which is always 1 (representing cumulative probabilities).

ID          A    B    C    D    E     SIM
1:          0.49 0.64 0.86 0.97 1.00  0.98
2:          0.76 0.84 0.98 0.99 1.00  0.87
3:          0.32 0.56 0.72 0.92 1.00  0.12

The column SIM represents a column with random uniform numbers.

I wish to add a new column SIM_CAT with values equal to the column-name which value is the right boundary of the interval in which the value in column SIM falls:

ID          A    B    C    D    E     SIM  SIM_CAT
1:          0.49 0.64 0.86 0.97 1.00  0.98 E
2:          0.76 0.84 0.98 0.99 1.00  0.87 C
3:          0.32 0.56 0.72 0.92 1.00  0.12 A

I there a concise way to do that?

CodePudding user response:

You can compare columns with SIM and use idxmax to find the 1st greater value:

cols = list('ABCDE')
df['SIM_CAT'] = df[cols].ge(df.SIM, axis=0).idxmax(axis=1)
df
   ID     A     B     C     D    E   SIM SIM_CAT
0  1:  0.49  0.64  0.86  0.97  1.0  0.98       E
1  2:  0.76  0.84  0.98  0.99  1.0  0.87       C
2  3:  0.32  0.56  0.72  0.92  1.0  0.12       A

If SIM can contain values greater than 1:

cols = list('ABCDE')
df['SIM_CAT'] = None
df.loc[df.SIM <= 1, 'SIM_CAT'] = df[cols].ge(df.SIM, axis=0).idxmax(axis=1)

df
   ID     A     B     C     D    E   SIM SIM_CAT
0  1:  0.49  0.64  0.86  0.97  1.0  0.98       E
1  2:  0.76  0.84  0.98  0.99  1.0  0.87       C
2  3:  0.32  0.56  0.72  0.92  1.0  0.12       A
  • Related