I have a pandas dataframe with multiple columns were their values increase from some value between 0 and 1 for column A
up to column E
which is always 1 (representing cumulative probabilities).
ID A B C D E SIM
1: 0.49 0.64 0.86 0.97 1.00 0.98
2: 0.76 0.84 0.98 0.99 1.00 0.87
3: 0.32 0.56 0.72 0.92 1.00 0.12
The column SIM
represents a column with random uniform numbers.
I wish to add a new column SIM_CAT
with values equal to the column-name which value is the right boundary of the interval in which the value in column SIM
falls:
ID A B C D E SIM SIM_CAT
1: 0.49 0.64 0.86 0.97 1.00 0.98 E
2: 0.76 0.84 0.98 0.99 1.00 0.87 C
3: 0.32 0.56 0.72 0.92 1.00 0.12 A
I there a concise way to do that?
CodePudding user response:
You can compare columns with SIM
and use idxmax
to find the 1st greater value:
cols = list('ABCDE')
df['SIM_CAT'] = df[cols].ge(df.SIM, axis=0).idxmax(axis=1)
df
ID A B C D E SIM SIM_CAT
0 1: 0.49 0.64 0.86 0.97 1.0 0.98 E
1 2: 0.76 0.84 0.98 0.99 1.0 0.87 C
2 3: 0.32 0.56 0.72 0.92 1.0 0.12 A
If SIM
can contain values greater than 1:
cols = list('ABCDE')
df['SIM_CAT'] = None
df.loc[df.SIM <= 1, 'SIM_CAT'] = df[cols].ge(df.SIM, axis=0).idxmax(axis=1)
df
ID A B C D E SIM SIM_CAT
0 1: 0.49 0.64 0.86 0.97 1.0 0.98 E
1 2: 0.76 0.84 0.98 0.99 1.0 0.87 C
2 3: 0.32 0.56 0.72 0.92 1.0 0.12 A