Home > database >  How to transform dataframe to binary based on values being above/below the row median (if > media
How to transform dataframe to binary based on values being above/below the row median (if > media

Time:05-23

I am looking to transform a dataframe to binary based on row median. Please see my input and expected output below.

import pandas as pd
df_input = pd.DataFrame({'row1': [5, 10, 20], 'row2': [1, 30, 40],},
                        index = ['2021-02-24', '2021-02-25', '2021-02-26'])
df_expected_output = pd.DataFrame({'row1': [1, 0, 0], 'row2': [0, 1, 1],},
                        index = ['2021-02-24', '2021-02-25', '2021-02-26'])
df_median = df_input.median(axis=1)

I found this elegant solution for transforming based on column median here but could not get it to work for comparing rows.

(dat > dat.median()).astype('int')

How can I do this for rows?

CodePudding user response:

Use gt with the correct axis:

df_input.gt(df_input.median(axis=1), axis=0).astype(int)

output:

            row1  row2
2021-02-24     1     0
2021-02-25     0     1
2021-02-26     0     1
  • Related