Home > front end >  Looking for a way to create a "median split" variable for several variables
Looking for a way to create a "median split" variable for several variables

Time:12-01

I have a dataset which contains about 40 different variables. Now I would like to create a new variable indicating whether each observation is above or below the median.

I managed to create a new variable "var1_mediansplit" from the existing "var1" (values 1 for below median, 2 for everything else):

mydata$var1_mediansplit <- ifelse(mydata$var1 < median(mydata$var1), mydata$var1_mediansplit <- "1", mydata$var1_mediansplit <- "2"

I am looking for a way to run it through several variables (with a loop, I would guess). I appreciate any help!

CodePudding user response:

Using the colMedians and eachrow from the Rfast package:

library(Rfast)

df <- as.data.frame(matrix(runif(4000), ncol = 40)) # dummy data
m <- as.matrix(df)
df2 <- as.data.frame((eachrow(m, colMedians(m), "-") >= 0)   1)

CodePudding user response:

Don't overcomplicate it, consider this:

v <- c(1:100)
x <- median(v)
y <- v >= x
  •  Tags:  
  • r
  • Related