I have a df with the following structure:
df = pd.DataFrame({'varb': ['0.56', '0.74', '0.89', '0.99', '0.24', '0.76', '0.60'],
'response': ['141', '134', '72', '29', '34', '50', '128'],
})
df
I want to perform a median split on 'varb' and have the top 50th percentile be put in group '2' and the bottom 50th percentile be put in group '1' so the resulting dataframe would look like this:
df = pd.DataFrame({'varb': ['0.56', '0.74', '0.89', '0.99', '0.24', '0.76', '0.60'],
'response': ['141', '134', '72', '29', '34', '50', '128'],
'median_split': ['2', '2', '2', '1', '1', '1', '2']})
df
How can I achieve this using python?
CodePudding user response:
Looks like you used response instead of varb for the median split in your example. You can use the quantile
method of pandas DataFrame/Series. By default, it only computes the median, but you can use it to compute any percentile.
df["median_split"] = (df.response<df.response.quantile()).replace({True:1, False:2})